Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented support of delimiter options #2

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zihotki
Copy link

@zihotki zihotki commented Jan 2, 2019

Implemented support of delimiter options so that delimiters can be configurable (delimiters still limited to be chars). This should fix dotarj/PartialResponse#23

@zihotki
Copy link
Author

zihotki commented Jan 2, 2019

The implementation only extends original parser a bit to allow to specify delimiters via options and doesn't add any complex parsing rules like combination of different delimiters foo(bar],baz.qwe/rty. Implementing that won't add any additional value but unnecessary complexity.

@zihotki
Copy link
Author

zihotki commented Jan 2, 2019

Looks like I can't fix the build error :(

@dotarj
Copy link
Owner

dotarj commented Jan 15, 2019

Hi @zihotki, thanks for the PR! I fixed the build (AppVeyor uses the .appveyor.yml from the master, that's why you couldn't fix the problem in your branch). I'll take a look at the changes and get back to you.

Copy link
Owner

@dotarj dotarj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zihotki, the PR looks really good. Nice work! I added some comments.

What I'm primarily concerned with is the use of default parameters in the public methods Fields.Matches and Fields.TryParse. If the PartialResponse.Core package is updated, but the dependent packages are build using the previous version, what will happen? Will the .NET runtime accept the new method or will it throw a MissingMethodException? According to the second answer of this question the optional parameters are baked into the call site. If you use old school overloads, this won't be a problem. What do you think?

/// <param name="nestedFieldDelimiters">Characters representing nested field delimiters.</param>
/// <param name="fieldGroupStartDelimiters">Characters representing field group start delimiters.</param>
/// <param name="fieldGroupEndDelimiters">Characters representing field group end delimiters.</param>
public DelimiterOptions(char[] fieldsDelimiters, char[] nestedFieldDelimiters, char[] fieldGroupStartDelimiters, char[] fieldGroupEndDelimiters)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use dictionaries and not a single char per type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually makes a lot of sense but then there will be too many changes for parser. I've tried that way and found that it becomes very complicated to make a hand-crafted parser. So probably better go baby steps and implement version allowing one to use single character delimiters (most of the delimiters are indeed are a single character). And then work on enhancing it further.

src/PartialResponse.Core/DelimiterOptions.cs Outdated Show resolved Hide resolved
/// <param name="value">The value of the field.</param>
public Field(string value)
/// <param name="parts">The value of the field.</param>
public Field(params string[] parts)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the constructor changed? As a result Parser new uses a Stack<List> but Field expects an array, so ToArray is called. Less string manipulation, but is this more efficient compared to the string
parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was to move responsibility of splitting field parts to parser to not to make a dependency on the delimiter options. But looking at it now it made me realize that it will probably have to have it and Wildcard should be also part of the delimiter options.

/// Returns a string that represents the current object.
/// </summary>
/// <returns>A string that represents the current object.</returns>
public override string ToString()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the ToString method removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, unwillingness to make a dependency on delimiter options. But according to comment above I'll change that. This leads to the question, if there are more than one field part delimiter which one should be used, first one? Or better to keep a reference to original field string? What do you think?

@@ -17,14 +17,21 @@ public class ParserContext
/// Initializes a new instance of the <see cref="ParserContext"/> class.
/// </summary>
/// <param name="source">A <see cref="TextReader"/> representing the input string.</param>
public ParserContext(TextReader source)
/// <param name="options">Delimiters options for parser.</param>
public ParserContext(TextReader source, DelimiterOptions options)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter options for the constructor is a breaking change. We should add a second constructor which only takes the source parameter and passes the source and DelimiterOptions.DefaultOptions to the other constructor.

src/PartialResponse.Core/TokenType.cs Outdated Show resolved Hide resolved
src/PartialResponse.Core/TokenType.cs Outdated Show resolved Hide resolved
src/PartialResponse.Core/Tokenizer.cs Outdated Show resolved Hide resolved
{
this.TakeCharactersWhile(character => this.IsWhiteSpace(character));
this.TakeCharactersWhile(character => char.IsWhiteSpace(character));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why I chose to use another implementation of IsWhiteSpace with only space, tab, new line, and carriage return back then. char.IsWhiteSpace matches more characters, as can be found in the documentation. Is it wise to change this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding it should be fine. It's not going to make a big change since those additional unicode white-spaces won't (ok, shouldn't) be used there. But char.IsWhiteSpace looks cleaner to me.

@zihotki
Copy link
Author

zihotki commented Feb 11, 2019

Thanks for the feedback, will review and update PR later today

Vasili Puchko added 2 commits February 15, 2019 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Characters changing
2 participants