What is the purpose of the StringSegment class? - c#

In the Microsoft.Extensions.Primitives package lib there is a class StringSegment for which the comments indicate that it is:
An optimized representation of a substring.
I was unaware of this particular class, until I discovered aspnet announcement #244, stating: Microsoft.Net.Http.Headers converted to use StringSegments.
Still, looking at the implementation of the StringSegment class, I fail to see what purpose it actually serves. I see a buffer, which I guess would indicate better manipulation on partial characters (the 'segment' part perhaps?). I also see several helper functions which are closely related - if not identical - in behaviour to those already available at regular strings, such as StartsWith/Endswith, Substring etc. The aspnet-core docs list these in full, but again this also lacks context on "why" it should be used.
So what exactly is the purpose of the StringSegment class and in which scenarios is it applicable to use it?
Is it useful to call the class in my application code, when I manipulate strings?
Can we have an example, where it will be beneficial?

It lets you perform a variety of string operations on a substring of another string, without actually calling Substring() and creating a new string object. It's roughly analogous to the way in C you can have a pointer into the middle of a string.

When parsing text many new string objects may be created or copied. This class in theory would help reduce memory used when handling large substrings. Other languages have similar concepts (see std::string_view in C++17)

Related

Any reason not to use `new object().foo()`?

When using extremely short-lived objects that I only need to call one method on, I'm inclined to chain the method call directly to new. A very common example of this is something like the following:
string noNewlines = new Regex("\\n+").Replace(" ", oldString);
The point here is that I have no need for the Regex object after I've done the one replacement, and I like to be able to express this as a one-liner. Is there any non-obvious problem with this idiom? Some of my coworkers have expressed discomfort with it, but without anything that seemed to be like a good reason.
(I've marked this as both C# and Java, since the above idiom is common and usable in both languages.)
This particular pattern is fine -- I use it myself on occasion.
But I would not use this pattern as you have in your example. There are two alternate approaches that are better.
Better approach: Use the static method Regex.Replace(string,string,string). There is no reason to obfuscate your meaning with the new-style syntax when a static method is available that does the same thing.
Best approach: If you use the same static (not dynamically-generated) Regex from the same method, and you call this method a lot, you should store the Regex object as a private static field on the class containing the method, since this avoids parsing the expression on each call to the method.
I don't see anything wrong with this; I do this quite frequently myself.
The only exception to the rule might be for debugging purposes, it's sometimes necessary to be able to see the state of the object in the debugger, which can be difficult in a one-liner like this.
If you don't need the object afterwards, I don't see a problem - I do it myself from time to time as well. However, it can be quite hard to spot, so if your coworkers are expressing discomfort, you might need to put it into a variable so there are no hard feelings on the team. Doesn't really hurt you.
You just have to be careful when you're chaining methods of objects that implement IDisposable. Doing a single-line chain doesn't really leave room for calling Dispose or the using {...} block.
For example:
DialogResult result = New SomeCfgDialog(some_data).ShowDialog();
There is no instance variable on which to call Dispose.
Then there is potential to obfuscate intent, hurt rather than improve readability and make it tougher to examine values while debugging. But those are all issues particular to the object and the situation and the number of methods chained. I don't think that there is a single reason to avoid it. Sometimes doing this will make the code more concise and readable and other times it might hurt for some of the reasons mentioned above.
As long as you're sure that the object is never needed again (or you're not creating multiple instances of an identical object), then there's no problem with it.
If the rest of your team isn't comfortable with it, though, you might want to re-think the decision. The team should set the standards and you should follow them. Be consistent. If you want to change the standard, discuss it. If they don't agree, then fall in line.
I think thats ok, and would welcome comments/reasons to the contrary. When the object is not short lived (or uses unmanaged resources - ie COM) then this practice can get you into trouble.
The issue is readability.
Putting the "chained" methods on a separate line seems to be the preferred convention with my team.
string noNewlines = new Regex("\\n+")
.Replace(" ", oldString);
One reason to avoid this style is that your coworkers might want to inspect the object in a debug mode. If you compound the similar instantiation the readability goes down a lot. For example :
String val = new Object1("Hello").doSomething(new Object2("interesting").withThis("input"));
Generally I prefer using a static method for the specific example you have mentioned.
The only potential problem I could see is - if, for some reason, new Regex were NULL because it was not instantiated correctly, you would get a Null Pointer Exception. However, I highly doubt that since Regex is always defined...
If you don't care about the object you invoke the method on, that's a sign that the method should probably be static.
In C#, I'd probably write an extension method to wrap the regex, so that I could write
string noNewlines = oldString.RemoveNewlines();
The extension method would look something like
using System.Text.RegularExpressions;
namespace Extensions
{
static class SystemStringExtensions
{
public static string RemoveNewlines(this string inputString)
{
// replace newline characters with spaces
return Regex.Replace(inputString, "\\n+", " ");
}
}
}
I find this much easier to read than your original example. It's also quite reusable, as stripping newline characters is one of the more common activities.

Getting my head around object oriented programming

I am entry level .Net developer and using it to develop web sites. I started with classic asp and last year jumped on the ship with a short C# book.
As I developed I learned more and started to see that coming from classic asp I always used C# like scripting language.
For example in my last project I needed to encode video on the webserver and wrote a code like
public class Encoder
{
Public static bool Encode(string videopath) {
...snip...
return true;
}
}
While searching samples related to my project I’ve seen people doing this
public class Encoder
{
Public static Encode(string videopath) {
EncodedVideo encoded = new EncodedVideo();
...snip...
encoded.EncodedVideoPath = outputFile;
encoded.Success = true;
...snip...
}
}
public class EncodedVideo
{
public string EncodedVideoPath { get; set; }
public bool Success { get; set; }
}
As I understand second example is more object oriented but I don’t see the point of using EncodedVideo object.
Am I doing something wrong? Does it really necessary to use this sort of code in a web app?
someone once explained OO to me as a a soda can.
A Soda can is an object, an object has many properties. And many methods. For example..
SodaCan.Drink();
SodaCan.Crush();
SocaCan.PourSomeForMyHomies();
etc...
The purpose of OO Design is theoretically to write a line of code once, and have abstraction between objects.
This means that Coder.Consume(SodaCan.contents); is relative to your question.
An encoded video is not the same thing as an encoder. An encoder returns an encoded video. and encoded video may use an encoder but they are two seperate objects. because they are two different entities serving different functions, they simply work together.
Much like me consuming a soda can does not mean that I am a soda can.
Neither example is really complete enough to evaluate. The second example seems to be more complex than the first, but without knowing how it will be used it's difficult to tell.
Object Oriented design is at it's best when it allows you to either:
1) Keep related information and/or functions together (instead of using parallel arrays or the like).
Or
2) Take advantage of inheritance and interface implementation.
Your second example MIGHT be keeping the data together better, if it returns the EncodedVideo object AND the success or failure of the method needs to be kept track of after the fact. In this case you would be replacing a combination of a boolean "success" variable and a path with a single object, clearly documenting the relation of the two pieces of data.
Another possibility not touched on by either example is using inheritance to better organize the encoding process. You could have a single base class that handles the "grunt work" of opening the file, copying the data, etc. and then inherit from that class for each different type of encoding you need to perform. In this case much of your code can be written directly against the base class, without needing to worry about what kind of encoding is actually being performed.
Actually the first looks better to me, but shouldn't return anything (or return an encoded video object).
Usually we assume methods complete successfully without exceptional errors - if exceptional errors are encountered, we throw an exception.
Object oriented programming is fundamentally about organization. You can program in an OO way even without an OO language like C#. By grouping related functions and data together, it is easier to deal with increasingly complex projects.
You aren't necessarily doing something wrong. The question of what paradigm works best is highly debatable and isn't likely to have a clear winner as there are so many different ways to measure "good" code,e.g. maintainable, scalable, performance, re-usable, modular, etc.
It isn't necessary, but it can be useful in some cases. Take a look at various MVC examples to see OO code. Generally, OO code has the advantage of being re-usable so that what was written for one application can be used for others over and over again. For example, look at log4net for example of a logging framework that many people use.
The way your structure an OO program--which objects you use and how you arrange them--really depends on many factors: the age of the project, the overall size of the project, complexity of the problem, and a bit for just personal taste.
The best advice I can think of that will wrap all the reasons for OO into one quick lesson is something I picked up learning design patterns: "Encapsulate the parts that change." The value of OO is to reuse elements that will be repeated without writing additional code. But obviously you only care to "wrap up" code into objects if it will actually be reused or modified in the future, thus you should figure out what is likely to change and make objects out of it.
In your example, the reason to use the second set up may be that you can reuse the EncodedVideo object else where in the program. Anytime you need to deal with EncodedVideo, you don't concern yourself with the "how do I encode and use video", you just use the object you have and trust it to handle the logic. It may also be valuable to encapsulate the encoding logic if it's complex, and likely to change. Then you isolate changes to just one place in the code, rather than many potential places where you might have used the object.
(Brief aside: The particular example you posted isn't valid C# code. In the second example, the static method has no return type, though I assume you meant to have it return the EncodedVideo object.)
This is a design question, so answer depends on what you need, meaning there's no right or wrong answer. First method is more simple, but in second case you incapsulate encoding logic in EncodedVideo class and you can easily change the logic (based on incoming video type, for instance) in your Encoder class.
I think the first example seems more simple, except I would avoid using statics whenever possible to increase testability.
public class Encoder
{
private string videoPath;
public Encoder(string videoPath) {
this.videoPath = videoPath;
}
public bool Encode() {
...snip...
return true;
}
}
Is OOP necessary? No.
Is OOP a good idea? Yes.
You're not necessarily doing something wrong. Maybe there's a better way, maybe not.
OOP, in general, promotes modularity, extensibility, and ease of maintenance. This goes for web applications, too.
In your specific Encoder/EncodedVideo example, I don't know if it makes sense to use two discrete objects to accomplish this task, because it depends on a lot of things.
For example, is the data stored in EncodedVideo only ever used within the Encode() method? Then it might not make sense to use a separate object.
However, if other parts of the application need to know some of the information that's in EncodedVideo, such as the path or whether the status is successful, then it's good to have an EncodedVideo object that can be passed around in the rest of the application. In this case, Encode() could return an object of type EncodedVideo rather than a bool, making that data available to the rest of your app.
Unless you want to reuse the EncodedVideo class for something else, then (from what code you've given) I think your method is perfectly acceptable for this task. Unless there's unrelated functionality in EncodedVideo and the Encoder classes or it forms a massive lump of code that should be split down, then you're not really lowering the cohesion of your classes, which is fine. Assuming you don't need to reuse EncodedVideo and the classes are cohesive, by splitting them you're probably creating unnecessary classes and increasing coupling.
Remember: 1. the OO philosophy can be quite subjective and there's no single right answer, 2. you can always refactor later :p

What's the point of DSLs / fluent interfaces

I was recently watching a webcast about how to create a fluent DSL and I have to admit, I don't understand the reasons why one would use such an approach (at least for the given example).
The webcast presented an image resizing class, that allows you to specify an input-image, resize it and save it to an output-file using the following syntax (using C#):
Sizer sizer = new Sizer();
sizer.FromImage(inputImage)
.ToLocation(outputImage)
.ReduceByPercent(50)
.OutputImageFormat(ImageFormat.Jpeg)
.Save();
I don't understand how this is better than a "conventional" method that takes some parameters:
sizer.ResizeImage(inputImage, outputImage, 0.5, ImageFormat.Jpeg);
From a usability point of view, this seems a lot easier to use, since it clearly tells you what the method expects as input. In contrast, with the fluent interface, nothing stops you from omitting/forgetting a parameter/method-call, for example:
sizer.ToLocation(outputImage).Save();
So on to my questions:
1 - Is there some way to improve the usability of a fluent interface (i.e. tell the user what he is expected to do)?
2 - Is this fluent interface approach just a replacement for the non existing named method parameters in C#? Would named parameters make fluent interfaces obsolete, e.g. something similar objective-C offers:
sizer.Resize(from:input, to:output, resizeBy:0.5, ..)
3 - Are fluent interfaces over-used simply because they are currently popular?
4 - Or was it just a bad example that was chosen for the webcast? In that case, tell me what the advantages of such an approach are, where does it make sense to use it.
BTW: I know about jquery, and see how easy it makes things, so I'm not looking for comments about that or other existing examples.
I'm more looking for some (general) comments to help me understand (for example) when to implement a fluent interface (instead of a classical class-library), and what to watch out for when implementing one.
2 - Is this fluent interface approach
just a replacement for the non
existing named method parameters in
C#? Would named parameters make fluent
interfaces obsolete, e.g. something
similar objective-C offers:
Well yes and no. The fluent interface gives you a larger amount of flexibility. Something that could not be achieved with named params is:
sizer.FromImage(i)
.ReduceByPercent(x)
.Pixalize()
.ReduceByPercent(x)
.OutputImageFormat(ImageFormat.Jpeg)
.ToLocation(o)
.Save();
The FromImage, ToLocation and OutputImageFormat in the fluid interface, smell a bit to me. Instead I would have done something along these lines, which I think is much clearer.
new Sizer("bob.jpeg")
.ReduceByPercent(x)
.Pixalize()
.ReduceByPercent(x)
.Save("file.jpeg",ImageFormat.Jpeg);
Fluent interfaces have the same problems many programming techniques have, they can be misused, overused or underused. I think that when this technique is used effectively it can create a richer and more concise programming model. Even StringBuilder supports it.
var sb = new StringBuilder();
sb.AppendLine("Hello")
.AppendLine("World");
I would say that fluent interfaces are slightly overdone and I would think that you have picked just one such example.
I find fluent interfaces particularly strong when you are constructing a complex model with it. With model I mean e.g. a complex relationship of instantiated objects. The fluent interface is then a way to guide the developer to correctly construct instances of the semantic model. Such a fluent interface is then an excellent way to separate the mechanics and relationships of a model from the "grammar" that you use to construct the model, essentially shielding details from the end user and reducing the available verbs to maybe just those relevant in a particular scenario.
Your example seems a bit like overkill.
I have lately done some fluent interface on top of the SplitterContainer from Windows Forms. Arguably, the semantic model of a hierarchy of controls is somewhat complex to correctly construct. By providing a small fluent API a developer can now declaratively express how his SplitterContainer should work. Usage goes like
var s = new SplitBoxSetup();
s.AddVerticalSplit()
.PanelOne().PlaceControl(()=> new Label())
.PanelTwo()
.AddHorizontalSplit()
.PanelOne().PlaceControl(()=> new Label())
.PanelTwo().PlaceControl(()=> new Panel());
form.Controls.Add(s.TopControl);
I have now reduced the complex mechanics of the control hierarchy to a couple of verbs that are relevant for the issue at hand.
Hope this helps
Consider:
sizer.ResizeImage(inputImage, outputImage, 0.5, ImageFormat.Jpeg);
What if you used less clear variable names:
sizer.ResizeImage(i, o, x, ImageFormat.Jpeg);
Imagine you've printed this code out. It's harder to infer what these arguments are, as you don't have access to the method signature.
With the fluent interface, this is clearer:
sizer.FromImage(i)
.ToLocation(o)
.ReduceByPercent(x)
.OutputImageFormat(ImageFormat.Jpeg)
.Save();
Also, the order of methods is not important. This is equivalent:
sizer.FromImage(i)
.ReduceByPercent(x)
.OutputImageFormat(ImageFormat.Jpeg)
.ToLocation(o)
.Save();
In addition, perhaps you might have defaults for the output image format, and the reduction, so this could become:
sizer.FromImage(i)
.ToLocation(o)
.Save();
This would require overloaded constructors to achieve the same effect.
It's one way to implement things.
For objects that do nothing but manipulate the same item over and over again, there's nothing really wrong with it. Consider C++ Streams: they're the ultimate in this interface. Every operation returns the stream again, so you can chain together another stream operation.
If you're doing LINQ, and doing manipulation of an object over and over, this makes some sense.
However, in your design, you have to be careful. What should the behavior be if you want to deviate halfway through? (IE,
var obj1 = object.Shrink(0.50); // obj1 is now 50% of obj2
var obj2 = object.Shrink(0.75); // is ojb2 now 75% of ojb1 or is it 75% of the original?
If obj2 was 75% of the original object, then that means you're making a full copy of the object every time (and has its advantages in many cases, like if you're trying to make two instances of the same thing, but slightly differently).
If the methods simply manipulate the original object, then this kind of syntax is somewhat disingenuous. Those are manipulations on the object instead of manipulations to create a changed object.
Not all classes work like this, nor does it make sense to do this kind of design. For example, this style of design would have little to no usefulness in the design of a hardware driver or the core of a GUI application. As long as the design involves nothing but manipulating some data, this pattern isn't a bad one.
You should read Domain Driven Design by Eric Evans to get some idea why is DSL considered good design choice.
Book is full of good examples, best practice advices and design patterns. Highly recommended.
It's possible to use a variation on a Fluent interface to enforce certain combinations of optional parameters (e.g. require that at least one parameter from a group is present, and require that if a certain parameter is specified, some other parameter must be omitted). For example, one could provide a functionality similar to Enumerable.Range, but with a syntax like IntRange.From(5).Upto(19) or IntRange.From(5).LessThan(10).Stepby(2) or IntRange(3).Count(19).StepBy(17). Compile-time enforcement of overly-complex parameter requirements may require the definition of an annoying number of intermediate-value structures or classes, but the approach can in some cases prove useful in simpler cases.
Further to #sam-saffron's suggestion regarding the flexibility of a Fluent Interface when adding a new operation:
If we needed to add a new operation, such as Pixalize(), then, in the 'method with multiple parameters' scenario, this would require a new parameter to be added to the method signature. This may then require a modification to every invocation of this method throughout the codebase in order to add a value for this new parameter (unless the language in use would allow an optional parameter).
Hence, one possible benefit of a Fluent Interface is limiting the impact of future change.

Ab-using languages

Some time ago I had to address a certain C# design problem when I was implementing a JavaScript code-generation framework. One of the solutions I came with was using the “using” keyword in a totally different (hackish, if you please) way. I used it as a syntax sugar (well, originally it is one anyway) for building hierarchical code structure. Something that looked like this:
CodeBuilder cb = new CodeBuilder();
using(cb.Function("foo"))
{
// Generate some function code
cb.Add(someStatement);
cb.Add(someOtherStatement);
using(cb.While(someCondition))
{
cb.Add(someLoopStatement);
// Generate some more code
}
}
It is working because the Function and the While methods return IDisposable object, that, upon dispose, tells the builder to close the current scope. Such thing can be helpful for any tree-like structure that need to be hard-codded.
Do you think such “hacks” are justified? Because you can say that in C++, for example, many of the features such as templates and operator overloading get over-abused and this behavior is encouraged by many (look at boost for example). On the other side, you can say that many modern languages discourage such abuse and give you specific, much more restricted features.
My example is, of course, somewhat esoteric, but real. So what do you think about the specific hack and of the whole issue? Have you encountered similar dilemmas? How much abuse can you tolerate?
I think this is something that has blown over from languages like Ruby that have much more extensive mechanisms to let you create languages within your language (google for "dsl" or "domain specific languages" if you want to know more). C# is less flexible in this respect.
I think creating DSL's in this way is a good thing. It makes for more readable code. Using blocks can be a useful part of a DSL in C#. In this case I think there are better alternatives. The use of using is this case strays a bit too far from its original purpose. This can confuse the reader. I like Anton Gogolev's solution better for example.
Offtopic, but just take a look at how pretty this becomes with lambdas:
var codeBuilder = new CodeBuilder();
codeBuilder.DefineFunction("Foo", x =>
{
codeBuilder.While(condition, y =>
{
}
}
It would be better if the disposable object returned from cb.Function(name) was the object on which the statements should be added. That internally this function builder passed through the calls to private/internal functions on the CodeBuilder is fine, just that to public consumers the sequence is clear.
So long as the Dispose implementation would make the following code cause a runtime error.
CodeBuilder cb = new CodeBuilder();
var f = cb.Function("foo")
using(function)
{
// Generate some function code
f.Add(someStatement);
}
function.Add(something); // this should throw
Then the behaviour is intuitive and relatively reasonable and correct usage (below) encourages and prevents this happening
CodeBuilder cb = new CodeBuilder();
using(var function = cb.Function("foo"))
{
// Generate some function code
function.Add(someStatement);
}
I have to ask why you are using your own classes rather than the provided CodeDomProvider implementations though. (There are good reasons for this, notably that the current implementation lacks many of the c# 3.0 features) but since you don't mention it yourself...
Edit: I would second Anoton's suggest to use lamdas. The readability is much improved (and you have the option of allowing Expression Trees
If you go by the strictest definitions of IDisposable then this is an abuse. It's meant to be used as a method for releasing native resources in a deterministic fashion by a managed object.
The use of IDisposable has evolved to essentially be used by "any object which should have a deterministic lifetime". I'm not saying this is write or wrong but that's how many API's and users are choosing to use IDisposable. Given that definition it's not an abuse.
I wouldn't consider it terribly bad abuse, but I also wouldn't consider it good form because of the cognitive wall you're building for your maintenance developers. The using statement implies a certain class of lifetime management. This is fine in its usual uses and in slightly customized ones (like #heeen's reference to an RAII analogue), but those situations still keep the spirit of the using statement intact.
In your particular case, I might argue that a more functional approach like #Anton Gogolev's would be more in the spirit of the language as well as maintainable.
As to your primary question, I think each such hack must ultimately stand on its own merits as the "best" solution for a particular language in a particular situation. The definition of best is subjective, of course, but there are definitely times (especially when the external constraints of budgets and schedules are thrown into the mix) where a slightly more hackish approach is the only reasonable answer.
I often "abuse" using blocks. I think they provide a great way of defining scope. I have a whole series of objects that I use for capture and restoring state (e.g. of Combo boxes or the mouse pointer) during operations that may change the state. I also use them for creating and dropping database connections.
E.g.:
using(_cursorStack.ChangeCursor(System.Windows.Forms.Cursors.WaitCursor))
{
...
}
I wouldn't call it abuse. Looks more like a fancied up RAII technique to me. People have been using these for things like monitors.

Why use a GlobalClass? What are they for?

Why use a GlobalClass? What are they for? I have inherited some code (shown below) and as far as I can see there is no reason why strUserName needs this. What is all for?
public static string strUserName
{
get { return m_globalVar; }
set { m_globalVar = value; }
}
Used later as:
GlobalClass.strUserName
Thanks
You get all the bugs of global state and none of the yucky direct variable access.
If you're going to do it, then your coder implemented it pretty well. He/She probably thought (correctly) that they would be free to swap out an implementation later.
Generally it's viewed as a bad idea since it makes it difficult to test the system as a whole the more globals you have in it.
My 2 cents.
When you want to use a static member of a type, you use it like ClassName.MemberName. If your code snippet is in the same class as the member you're referring (in this example, you're coding in a GlobalClass member, and using strUserName) you can omit the class name. Otherwise, it's required as the compiler wouldn't have any knowledge of what class you're referring to.
This is a common approach when dealing with Context in ASP.Net; however, the implementation would never use a single variable. So if this is a web app I could see this approach being used to indicate who the current user is (Although there are better ways to do this).
I use a simillar approach where I have a MembershipService.CurrentUser property which then pulls a user out from either SessionState or LogicalCallContext (if its a web or client app).
But in these cases these aren't global as they are scoped within narrow confines (Like the http session state).
One case where I have used a global like this would be if I have some data which is static and never changes, and is loaded from the DB (And there's not enough of the data to justify storing it in a cache). You could just store it in a static variable so you don;t have to go back to the DB.
One a side note why was the developer using Hungarian notation to name Properties? even when there was no intellisense and all the goodness our IDEs provide we never used hungarian notation on Properties.
#Jayne, #Josh, it's hard to tell - but the code in the question could also be a static accessor to a static field - somewhat different than #Josh's static helper example (where you use instance or context variables within your helper).
Static Helper methods are a good way to conveniently abstract stateless chunks of functionality. However in the example there is potential for the global variable to be stateful - Demeter's Law guides us that you should only play with state that you own or are given e.g. by parameters.
http://www.c2.com/cgi/wiki?LawOfDemeter
Given the rules there occasional times when it is necessary to break them. You should trade the risk of using global state (primarily the risk of creating state/concurrency bugs) vs. the necessity to use globals.
Well if you want a piece of data to be available to any other class running in the jvm then the Global Class is the way to go.
There are only two slight problems;
One. The implmentation shown is not thread safe. The set... method of any global class should be marked critical or wrapped in a mutex.
Even in the niave example above consider what happens if two threads run simultaniously:
set("Joe") and set("Frederick") could result in "Joederick" or "Fre" or some other permutation.
Two. It doesnt scale well. "Global" refers to a single jvm. A more complex runtime environment like Jboss could be runnning several inter communicating jvms. So the global userid could be 'Joe' or 'Frederick' depending on which jvm your EJB is scheduled.

Categories

Resources