I´m developing a Quality Management software to show Errors of a Provisioning tool.
therefore I read all errors from an XML and group them
but there are errors like: "Forcedtime 1.08.2016 17:51:00 is in the past"
Is It possible to find Date-Values like this in a string and delete them ?
I can't work with a hard coded replace cause there are many different values for the Date Time.
Thanks for helping me
Yet another suggestion is to read the XML as such and discard the nodes you don't like/need. Then process only the nodes left.
In fact, while the Regex will do the trick for replacing something in the text, it might not be a good fit since this is structured data as opposed to bunch of data in a string format.
(0?[1-9]|[12][0-9]|3[01])\.(0?[1-9]|1[0-2])\.(\d{4}) (00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])
It checks for date time 1.08.2016 17:51:00.
You have to compare this date format in your string if matches then replace it
Try to learn reg exp
Related
I am reading in a header from a file which has time fields for example Time (UTC +1). I then need to compare this with a list of stored headers to work out if the file is valid however my stored headers are used for writing and so allow flexibility on the timezones by being written like so Time (UTC {0}).
I would like to know what the best way of dealing with this in as much of a flexible statement as possible. The only way I can imagine doing it is by getting the position of the { and only comparing up to that. This is fine in this circumstance but what if I have some words after the parameter which are more important than a closing bracket.
EDIT: I would like to give some context to the problem so that I can explain better how flexible I need it. I think I possibly didn't emphasise the fact that I didn't want it to JUST work with the time field.
I am trying to write a system which is very flexible. I store a list of valid headings and then use them to find out what value to read/write to the csv file. It is very flexible and easily maintainable. I want to be able to keep it neat and flexible. I want to be able to write a function which takes in a string which has one of more parameters in it and then compare it with a value which has had the parameters filled in (Like the example with the Time header). In the future I may have a field for temperature in a particular place so my stored heading would be Temperature in {0}({1}) which when I am reading back it would be Temperature in Britain(c) or Temperature in America(f).
You could use a regex like this one :
string pattern = #"Time \(UTC \{(\+)*\d\}\)";
Regex rgx = new Regex(pattern);
Regex has a Match method you can use to check whether any string matches the pattern you provided.
I have a lot of text data with different structure. I need to extract parts of these texts based on some text-based rules. I would use regular expressions but unfortunately the people who are using the application have never heard of it.
Basically the app does the following thing:
Load the data into a textbox
Type the structure of the output as a simple set of rules into another textbox
Receive the results in a 3rd textbox
Examples of data structures (I have megabytes of this data):
Label1: value1, measurement
Label2; value2; something else
Nr, value3 (comment)
...
I need some other approach that I could use instead of regular expressions. It can be extremely simple because all I need is one value from every row.
From the example above I have to obtain the following structure:
"value1, value2, value3"
Is there a simpler alternative to regex? Did someone already implement something like this?
I can also imagine that I am approaching the problem from the wrong angle, like forcing the simple user to write data extraction rules. In this case the question is transformed to something more generic like "How can build an application that lets a very simple user extract data from a separate texts?"
Edit:
I have the following simplest as possible matching implemented for them:
File content:
"Strain at break Ax2";"Unknown"
"Strain at break Ax1";"Unknown"
"Strain at break";"Unknown"
"Yield point strain";"Unknown"
"Uniform elongation";25.4087;"%"
"Tensile strength";261.323;"MPa"
"End test phase Yield point";1;"%"
"Maximum tensile force";5.22647;"kN"
Pattern:
"Tensile strength";(?<value>[^;\n]*);
"Maximum tensile force";(?<value>[^;\n]*);
Still too complex. The problem is if I start replacing the ugly part with another string to obtain for example:
"Tensile strength", [First value after]
I loose all the generic nature of the extraction because every file looks different from this one.
Take a look at the FileHelpers library. It allows runtime generation of file layouts and I think the one that would help in your example is the DelimitedClassBuilder.
In your case, I'd probably use FileHelpers to parse the record definitions into the DelimitedClassBuilder and then use the result to parse your records.
I have solved the issue by defining the rules as regular expressions. After the rules were defined I defined a wrapper rule-set that was easier to read by the users.
Ex. to extract a value from a line
Maximum amount of Sheet Drawing Force= 35.659695[kN]
I defined the regular expression
{0}=\s*(?<value>[^[\n\r]*)
then let the user define the name of the field. The {0} placeholder was then replaced with the name of the field and the regular expression applied.
I'm pretty sure it has been asked before, but I could not find anything good.
I'm trying to parse a log but having troubles with it.
At first it looked pretty easy because the log is build like this:
thing,thing,thing,thing
so I string split it on the ,
however in the value itself it is possible that a , appears, and this is where I did not know what to do anymore.
How would I successfully parse this kind of log?
Edit~~
here is an log example:
1326139200953,info,,0,"str value which may contain, ",,,0
1326139201109,info,,0,"str value which may contain, ",,,0
1326139201265,info,,0,"str value which may contain, ",,,0
1326139201999,start,,0,,,,0
1326139368296,new,F:\Dir\Dir\file.txt,1536,,0,,0
``
If your log file doesn't have field encapsulators, the fields have variable width, and the separator/delimiter can also appear in a field, then it's likely you can't program something that will work in all cases.
Can you supply an example of your log file data? It may be possible to match the parts you need with a regex.
Unfortunately I think your question is not answerable in its current state, please provide more info.
Edit: Thanks for updating the question, you do have field encapsulators (double quotes). This will make it easier!
I think there are many ways to do this. Personally i think i would carry on splitting on commas, but then loop over the resulting array, checking if the first character of any value is a double quote. If it is, then you need to join it to the array item after it. If the last character of the joined array item isn't a double quote, you need to continue joining until you've closed your opening double quote.
There's certainly a better way so you may wish to wait for another solution.
Edit 2: Give this a go and let me know how you get on:
string myRegex = #"(?<=^(?:[^""]*""[^""]*"")*[^""]*),";
string[] outputArray = Regex.Split(myStr, myRegex);
I have a xml with two properties: word and link.
How can I replace the words on a text to a link using the xml information.
Ex.:
XML
<word>dog</word>
<link>http://www.dog.com</link>
Text: The dog is nice.
Result: The dog is nice.
Results OK.
The problems:
1- If the text has the word dogs the result is incorret, because of "s".
2- I've tested doing a split by space on text to fix it, but if the word is composed like new year the result is incorret again.
Does anyone have any suggestions to do it and fix these problems (plural and compound words)?
Thanks for the help.
You can use Lucene.Net's contrib package Snowball for stemming (words->word , came->come , having->have etc.). But you will still have troubles with compound words
If you roll your own solution, I have had good success with the .NET pluralization capabilities:
http://msdn.microsoft.com/en-us/library/system.data.entity.design.pluralizationservices.pluralizationservice.aspx
Essentially, you can pass a word in its plural form and receive a singular version and vice versa.
This could be fairly intensive depending on how often the content changed, i.e. this wouldn't be a good choice to search thousands of words in real time.
Assuming that you can pre-process/cache the results or that the source file is small, you could:
Run Once
Identify all candidate words from the source file.
Parse/split phrases and pass them through the pluralization libraries to determine their plural counterparts.
Generate (and precompile) simple regular expressions to locate the words that you do want to match. For example, if you want to match "dog" but not "dogs" you could create a regex like dog[^s] which could then be executed against the text.
Run Whenever a Search/Replace is Needed
Run your list of source expressions against the text in question. I would suggest ordering the expressions from shortest to longest (otherwise a short expression may replace a word that was just parsed by a longer expression).
Again, this would be processor intensive to run in real-time (most solutions will be). As always, if you are parsing HTML, you should use an HTML parser, not a regular expression. In this case, you might use a proper parser to locate all text nodes and then perform the search/replace on them.
An alternative solution would be to put the text and keyword list into a database and use SQL Server Full Text Indexing which tends to be pretty smart about these things and supports intelligent match predicates. You could even combine this with a CLR stored procedure to handle things that .NET excels at (like string parsing).
Regardless of the approach, this will not be an exact science.
You're likely going to need a dictionary. Create a text file/XML file that contains both the singular and plural forms of the words you want. At runtime, load them into a Dictionary<String, String>. Then look up the value of <word/> in the dictionary and extract its singular value.
Ive made a small program in C#.net which doesnt really serve much of a purpose, its tells you the chance of your DOOM based on todays news lol. It takes an RSS on load from the BBC website and will then look for key words which either increment of decrease the percentage chance of DOOM.
Crazy little project which maybe one day the classes will come uin handy to use again for something more important.
I recieve the RSS in an xml format but it contains alot of div tags and formatting characters which i dont really want to be in the database of keywords,
What is the best way of removing these unwanted characters and div's?
Thanks,
Ash
If you want to remove the DIV tags WITH content as well:
string start = "<div>";
string end = "</div>";
string txt = Regex.Replace(htmlString, Regex.Escape(start) + "(?<data>[^" + Regex.Escape(end) + "]*)" + Regex.Escape(end), string.Empty);
Input: <xml><div>junk</div>XXX<div>junk2</div></xml>
Output: <xml>XXX</xml>
IMHO the easiest way is to use regular expressions. Something like:
string txt = Regex.Replace(htmlString, #"<(.|\n)*?>", string.Empty);
Depending on which tags and characters you want to remove you will modify the regex, of course. You will find a lot of material on this and other methods if you do a web search for 'strip html C#'.
SO question Render or convert Html to ‘formatted’ Text (.NET) might help you, too.
Stripping HTML tags from a given string is a common requirement and you can probably find many resources online that do it for you.
The accepted method, however, is to use a Regular expression based Search and Replace. This article provides a good sample along with benchmarks. Another point worth mentioning is that you would require separate Regex based lookups for the different kinds of unwanted characters you are seeing. (Perhaps showing us an example of the HTML you receive would help)
Note that your requirements may vary based on which tags you want to remove. In your question, you only mention DIV tags. If that is the only tag you need to replace, a simple string search and replace should suffice.
A regular expression such as this:
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
Would highlight all HTML tags.
Use this to remove them form your data.