-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented support of delimiter options #2
base: master
Are you sure you want to change the base?
Conversation
The implementation only extends original parser a bit to allow to specify delimiters via options and doesn't add any complex parsing rules like combination of different delimiters |
d0905a2
to
8eb7653
Compare
Looks like I can't fix the build error :( |
Hi @zihotki, thanks for the PR! I fixed the build (AppVeyor uses the .appveyor.yml from the master, that's why you couldn't fix the problem in your branch). I'll take a look at the changes and get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zihotki, the PR looks really good. Nice work! I added some comments.
What I'm primarily concerned with is the use of default parameters in the public methods Fields.Matches and Fields.TryParse. If the PartialResponse.Core package is updated, but the dependent packages are build using the previous version, what will happen? Will the .NET runtime accept the new method or will it throw a MissingMethodException? According to the second answer of this question the optional parameters are baked into the call site. If you use old school overloads, this won't be a problem. What do you think?
/// <param name="nestedFieldDelimiters">Characters representing nested field delimiters.</param> | ||
/// <param name="fieldGroupStartDelimiters">Characters representing field group start delimiters.</param> | ||
/// <param name="fieldGroupEndDelimiters">Characters representing field group end delimiters.</param> | ||
public DelimiterOptions(char[] fieldsDelimiters, char[] nestedFieldDelimiters, char[] fieldGroupStartDelimiters, char[] fieldGroupEndDelimiters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to use dictionaries and not a single char per type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually makes a lot of sense but then there will be too many changes for parser. I've tried that way and found that it becomes very complicated to make a hand-crafted parser. So probably better go baby steps and implement version allowing one to use single character delimiters (most of the delimiters are indeed are a single character). And then work on enhancing it further.
/// <param name="value">The value of the field.</param> | ||
public Field(string value) | ||
/// <param name="parts">The value of the field.</param> | ||
public Field(params string[] parts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the constructor changed? As a result Parser new uses a Stack<List> but Field expects an array, so ToArray is called. Less string manipulation, but is this more efficient compared to the string
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was to move responsibility of splitting field parts to parser to not to make a dependency on the delimiter options. But looking at it now it made me realize that it will probably have to have it and Wildcard should be also part of the delimiter options.
/// Returns a string that represents the current object. | ||
/// </summary> | ||
/// <returns>A string that represents the current object.</returns> | ||
public override string ToString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the ToString method removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, unwillingness to make a dependency on delimiter options. But according to comment above I'll change that. This leads to the question, if there are more than one field part delimiter which one should be used, first one? Or better to keep a reference to original field string? What do you think?
@@ -17,14 +17,21 @@ public class ParserContext | |||
/// Initializes a new instance of the <see cref="ParserContext"/> class. | |||
/// </summary> | |||
/// <param name="source">A <see cref="TextReader"/> representing the input string.</param> | |||
public ParserContext(TextReader source) | |||
/// <param name="options">Delimiters options for parser.</param> | |||
public ParserContext(TextReader source, DelimiterOptions options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new parameter options for the constructor is a breaking change. We should add a second constructor which only takes the source parameter and passes the source and DelimiterOptions.DefaultOptions to the other constructor.
{ | ||
this.TakeCharactersWhile(character => this.IsWhiteSpace(character)); | ||
this.TakeCharactersWhile(character => char.IsWhiteSpace(character)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why I chose to use another implementation of IsWhiteSpace with only space, tab, new line, and carriage return back then. char.IsWhiteSpace matches more characters, as can be found in the documentation. Is it wise to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding it should be fine. It's not going to make a big change since those additional unicode white-spaces won't (ok, shouldn't) be used there. But char.IsWhiteSpace
looks cleaner to me.
Thanks for the feedback, will review and update PR later today |
…nfigurable (delimiters still limited to be chars)
8eb7653
to
c768049
Compare
Implemented support of delimiter options so that delimiters can be configurable (delimiters still limited to be chars). This should fix dotarj/PartialResponse#23