Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
This questioned has been asked before in regard to other languages but I could't find anything on using regex or any other algorithm to solve this in C#.
For example:
Photosynthesis maintains atmospheric oxygen levels and supplies all of
the organic compounds and most of the energy necessary for life on
Earth. Most cases, oxygen is also released as a waste product. (((((THIS SERIES OF SPACES HERE THAT SUGGEST THE END OF A PARAGRAPH))))
Although photosynthesis is performed differently by different
species, the process always begins when energy from light is absorbed
by proteins called reaction centers that contain green chlorophyll
pigments.
should be formatted as:
Photosynthesis maintains atmospheric oxygen levels and supplies all of
the organic compounds and most of the energy necessary for life on
Earth.
Although photosynthesis is performed differently by different species,
the process always begins when energy from light is absorbed by
proteins called reaction centers that contain green chlorophyll
pigments.
How do I get this done?
var SpacedText = "Some sample text. This should be a new paragraph."
var NewlineText = Regex.Replace(SpacedText , #"\s{2,}", Environment.NewLine);
Change the 2 in the regex for however many spaces you want it to break on.
Environment.NewLine can be replaced with whatever newline delimiter you need (<br /> for html, or any listed here).
The best guess that I can think of is to match the end of sentence . and possible trailing whitespace, before also end of line, and replace it with . and carriage return/linefeed.
In this case the regex would be
\.\s*[\r\n]+
http://regex101.com/r/cU2tF9/1
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am very sure that there is a technical term for this problem, but unfortunately I do not know it.
I have an alphabetical charset and the requirement is to create the combination of all the chars with a maximum length
The idea is (sample):
Generate a collection of A, AA, AAA, AAAA, AAAAA, AAAAAA
Next: A, AB, ABA, ABAA, ABAAA
Next A, AB, ABB, ABBA, ABBAA
The reason:
We have to query an API that delivers search results.
But if I don't get search hits from the API on AAA, I don't need to search for AAAA anymore, because it can't get search hits either. I can then move on to AAB.
The question:
My problem is that I'm not sure how the code has to be built to achieve this goal. I lack the structural approach.
I've already tried nested loops, but unfortunately I don't get the result.
I also used Combination Libraries, but they focus on other problems.
Many thanks for hints!
What you're looking for is a particular data structure called a Tree, but probably more specifically in your case, a Trie.
Trie data structures are commonly used in things like Autocomplete. With the image below, if someone typed "te", I can traverse the Trie and see what options would come after that (tea, ted, ten).
It looks like this would also fit your use case from what I can tell.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello experts, I have to generate series of folders from a TextBox into specified location.I am having two textboxes to specify the limit of folders(say 30 folders).The problem am facing is that the folder names that i will be providing are alpha-numeric(say 121cs3h101) .
How to set limit when i provide an alpha-numeric values?
(For example: i provide textbox1=12cs3h101 and textbox2=12cs3h131 , i need the series limit to be generated). I am working with visual studio 2013 in c# windows form application. Thanks in advance.
ok I will try to give you a lead.
To parse a string or find specific characters one can use RegEx.Match or a simler method called String.Split. In both cases you have to be aware how your string is structured and how it can vary. The limits of variation are very important.
If as you say the beginning is always"12cs3h" you can either split the string at the character 'h'.
string[] sa = s.Split('h');
Or you can even use the index of 'h' (since the length seems to be fixed) and take the rest of the string to get the numbers.
int index = s.IndexOf('h');
The rest is up to you, ... convert, enumerate and so on.
EDIT: There is a nice method that does the enumeration job for you: Enumerable.Range Good luck
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to extract some words out of a paragraph of text if the word starts with %! and ends with !%
I'd imagine regex would be good for this but unfortunately my regex isn't that great...OK its pretty bad...OK its non existent :(.
EXAMPLE TEXT
You don't want no %!beef!%, boy
Know I run the streets, boy
Better follow me towards
Downtown
What you see is what you get %!girl!%
Don't ever forget girl
Ain't seen nothing yet until you're
%!Downtown!%
EXPECTED RESULT
beef, girl, Downtown
How can I achieve this in C# with or without regex?
Like this:
var reg = new Regex(#"%!(?<word>\w+)!%");
var inStr = #"You don't want no %!beef!%, boy
Know I run the streets, boy
Better follow me towards
Downtown
What you see is what you get %!girl!%
Don't ever forget girl
Ain't seen nothing yet until you're
%!Downtown!%";
var results = reg.Matches(inStr).Cast<Match>().Select(m => m.Groups["word"].Value);
This will give you a list of matched words. Converting it to a comma-separated string is an exercise I'll leave up to you..
Also, next time you should probably do some quick research, you're eventually going to have to learn simple regexes..
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
In VB.NET, I can quickly type And/AndAlso on the keyboard. In C#, I'm currently opening Character Map and copying the 'OR' vertical line character manually. Am I missing something that allows quick insertion of the line symbol?
It is also called the pipe key, on many keyboards (UK/US) it is a single broken vertical line (one some keyboards it is a single unbroken vertical line, but I mostly see it as a broken one).
Image from here.
Depends on the keyboard layout, but the | / pipe should be somewhere on the left of the enter key (US layout), or on the left of Z or 1 (first normal, the other with AltGr, UK layout).
Wherever you have the PIPE key, you could type, on the numeric keypad and keeping the ALT key pressed, the number 124
I don't know where you are from, but assume that you have a non-english keyboard. Unfortunately the C language (where this and other syntactic elements) originates from was developed with the english standard keyboard in mind.
I know some people here in Sweden are switching to english keyboard layout when coding - to get rid of the awkward placement of key C/C# characters like | { [ ] } \. (They are all combinations that require the AltGr key. Something had to be done to make place for the Swedish characters ÅÄÖ that all have their own keys.)
I have the "pipe" symbol as altGr + 1
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a FASTA file containing several protein sequences. The format is like
----------------------
>protein1
MYRALRLLARSRPLVRAPAAALASAPGLGGAAVPSFWPPNAAR
MASQNSFRIEYDTFGELKVPNDKYYGAQTVRSTMNFKIGGVTE
RMPTPVIKAFGILKRAAAEVNQDYGLDPKIANAIMKAADEVAE
GKLNDHFPLVVWQTGSGTQTNMNVNEVISNRAIEMLGGELGSK
IPVHPNDHVNKSQ
>protein2
MRSRPAGPALLLLLLFLGAAESVRRAQPPRRYTPDWPSLDSRP
LPAWFDEAKFGVFIHWGVFSVPAWGSEWFWWHWQGEGRPYQRF
MRDNYPPGFSYADFGPQFTARFFHPEEWADLFQAAGAKYVVLT
TKHHEGFTNW*
>protein3
MKTLLLLAVIMIFGLLQAHGNLVNFHRMIKLTTGKEAALSYGF
CHCGVGGRGSPKDATDRCCVTHDCCYKRLEKRGCGTKFLSYKF
SNSGSRITCAKQDSCRSQLCECDKAAATCFARNKTTY`
-----------------------------------
Is there a good way to read in this file and store the sequences separately?
Thanks
To do this one way is to:
Create a vector where each location
holds a name and the sequence
Go through the file line by line
If the line starts with > then add
an element to the end of the vector
and save the line.substring(1) to
the element as the protein name.
Initialize the sequence in the
element to equal "".
If the line.length == 0 then it is
blank and do nothing
Else the line doesn't start with >
then it is part of the sequence so
go current vector element.sequence
+= line. Thus way each line between >protein2 and >protein3 is
concatenated and saved to the
sequence of protein2
I think maybe a little more detail about the exact file structure could be helpful. Just looking at what you have (and a quick peek at the samples on wikipedia) suggest that the name of the protein is prepended with a >, followed by at least one line break, so that would be a good place to start.
You could split the file on newline, and look for a > character to determine the name.
From there it is a little less clear because I'm not sure if the sequence data is all in one line (no linebreaks) or if it could have linebreaks. If there are none, then you should be able to just store that sequence information, and move on to the next protein name. Something like this:
var reader = new StreamReader("C:\myfile.fasta");
while(true)
{
var line = reader.ReadLine();
if(string.IsNullOrEmpty(line))
break;
if(line.StartsWith(">"))
StoreProteinName(line);
else
StoreSequence(line);
}
If it were me, I would probably use TDD and some sample data to build out a simple parser, and then keep plugging in samples until I felt I had covered all of major variances in the format.
Can you use a language other than C#? There are excellent libraries for dealing with FASTA files and other biological sequence in Perl, Python, Ruby, Java, and R (off the top of my head). They're usually branded Bio* (i.e. BioPerl, BioJava, etc)
If you're interested in C or C++, check out the answers to this question over at Biostar:
http://biostar.stackexchange.com/questions/1516/c-c-libraries-for-bioinformatics
Do yourself a favor, and don't reinvent the wheel if you don't have to.