Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing range of terms parsable into MathExpr #289

Open
3 of 10 tasks
sundararajan-s opened this issue Aug 27, 2020 · 9 comments
Open
3 of 10 tasks

Increasing range of terms parsable into MathExpr #289

sundararajan-s opened this issue Aug 27, 2020 · 9 comments

Comments

@sundararajan-s
Copy link

sundararajan-s commented Aug 27, 2020

The following is a list of issues which need to be fixed/improved on in the parser for MathExpr

  • Issues relating to the verb of the clause being present in the TeX expression, i.e. a divides b being written as a|b etc.
    These result in the entire tree being wrongly parsed.
  • Handling inferences like sentences starting with 'hence', 'thus', 'therefore', etc. This leads to only the part including the specific word to be unparsed, the rest of the tree parses correctly.
  • Parsing 'There exists' and 'For all' statements. Both of these results in a part that is unparsed.
  • Parsing statements which include lines such as 'consider the', i.e. sentences which construct some element. Here the main part is parsed, and the 'consider' part does not parse, similar to sub-issue 1.
  • Handling sentences which reference the prover, such as those starting with "We can see that". Here most of the tree is unparsed, however this might be due to the fact that the sentence is not a logical assertion.
  • A specific instance of verbs being inside the TeX expression usually in 'if' statements, but also sometimes in other places.
  • Exceptions raised by using the words ‘each’, ‘those’, ‘these’ or ‘both’. This is simply fixed by adding the cases in the corresponding unapply function or by reformatting input sentences. These raise a MatchException at provingground.translation.MathExpr$Determiner$.apply
  • Certain issues with adverbs, e.g. is strictly smaller than not parsing, but is smaller than parsing. This results in one part not parsing which leads to the entire tree not parsing. Upon further investigation, this seems to be a problem only with the word 'strictly' and 'also' not being parsed in the same manner as other adverbs.
  • Add capability to handle conjunct and disjunct adjectives. Currently results in the adjective not being correctly parsed, the rest of the expression parses well.
  • Add support for the specific types of sentences mentioned, i.e. simple declarative sentences, assumptions, assertions, variable type specifications and alternate notation specification. Alternatively this may be achieved through the implementation of blocks.
@siddhartha-gadgil
Copy link
Owner

  1. Does your first point mean we extend the MathExpr language (to allow contexts?)
  2. At present tex expressions are always assigned the part of speech "proper noun". For a|b etc we need a different part of speech (I do not know if a single word type is possible here).
  3. As you may have noticed, many cases are handled by pre-processing, ideally by merging tokens and assigning a correct part-of-speech tag.
  4. As you go along, please give examples and precise errors for the cases; e.g. unparsed, wrongly parsed, or part wrongly parsed so the whole is unparsed.

@sundararajan-s
Copy link
Author

  1. That is one possibility. I was thinking it would be simpler to just drop the specific words. I think this will be needed in the subsequent step, the conversion from MathExpr to HoTT.
  2. I do not think so either. There are certain simple cases for which a fix is possible, I shall experiment with those. If it does not have any issues I may temporarily add those.
  3. Actually I did not notice much preprocessing. The preprocessing in the TeXParsed class is commented out, and besides that I did not find any preprocessing.
  4. I shall do that. I shall edit the original issue with those.

@siddhartha-gadgil
Copy link
Owner

  • The language should be extended if and only if the meaning of the sentence cannot be expressed. Otherwise one changes the parsing.
  • The POS tags are modified in a few cases. I think "such that" is replaced with where. There isn't much preprocessing because there isn't much of anything specific.

@sundararajan-s
Copy link
Author

sundararajan-s commented Aug 27, 2020

  • In that case I don't think the language will need to be extended for that issue. However for the adverb issue will require an extension to the language.

  • The substitution was commented out, I shall re-enable it and see the results.

@siddhartha-gadgil
Copy link
Owner

If it was commented out it probably is unnecessary due to a change somewhere, either my code or the Stanford parser.

@sundararajan-s
Copy link
Author

sundararajan-s commented Sep 2, 2020

Added new sub issue regarding conjunct adjectives.

@sundararajan-s
Copy link
Author

The sub-issue regarding verbs inside TeX expressions has been solved by replacing the specific TeX expression, for example, $a > b$ with "$a > b$ is true". The correct TeX expression is selected by iterating over all possible swaps and checking which one parses.

@siddhartha-gadgil
Copy link
Owner

siddhartha-gadgil commented Feb 15, 2021 via email

@sundararajan-s
Copy link
Author

For now, I'm doing an exhaustive search, but I do think in the future we could speed it up with some NLP methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants