Skip to content

Releases: saltudelft/libsa4py

LibSA4Py v0.4.0

08 May 11:46
Compare
Choose a tag to compare

Added

  • Adds the CountParametricTypeDepth visitor to count the depth of parametric types.

Fixed

  • Fixed an issue where type annotations for local variables with the same name in methods were not applied (TypeAlpplier).

LibSA4Py v0.3.0

17 Feb 09:37
Compare
Choose a tag to compare

Added

  • Adding the --tc CLI arg for the process command to type-check source files in Python projects using mypy.
  • Supporting qualified names for classes by adding the q_name field to the JSON output.
  • Adding line and column no. info for the start and end of:
    • Functions definitions
    • Class definitions
    • Module-level variables
    • Class variables
    • Functions' local variables
  • Improved the performance of the NLP preprocessing quite significantly (~10x).
  • Adds a CST visitor (TypeAnnotationCounter) for counting the total number of type annotations in source files.

Fixed

  • When applying types to functions' parameters, parameters with the default value of lambdas causes an exception for matching functions' signature.

Changed

  • Python 3.6 is no longer supported.

LibSA4Py v0.2.0

02 Jun 13:09
Compare
Choose a tag to compare

Added

  • Integrated pyre into the pipeline for inferring the types of variables in source code files of given projects
  • Extracting the usage of module constants, class variables, and local variables across a source code file (Contextual hint)
  • Adding extensive unit tests for testing various features and components of LibSA4Py.
  • Minor improvement to the SpaceAdder code transformer for handling some rare edge cases.
  • The --l CLI arg for merge option to process a specified number of projects.
  • Adding a CST transformer to reduce the depth of parametric types.
  • Extracting type annotations with a qualified name from source code files.
  • Adding apply command to apply inferred type annotations to source code files.
  • Adding a qualified name for functions in the JSON field q_name.
  • Adding line and column no. for the start and end of functions in the JSON field fn_lc.

Fixed

  • Malformed output sequences containing string literal type, i.e., Literal['Blah \n blah'].
  • Malformed output sequence for the Equal operator (i.e. ==) in comparisons
  • Extracting self variables in multi assignments expressions like self.x,self.y=1,2.
  • Replacing imaginary numbers with the [numeric] token.

Removed

  • Removing the unused input_projects argument from the Pipeline class.

LibSA4Py v0.1.0

04 Mar 13:58
Compare
Choose a tag to compare

Added

Pipeline

  • Parallel pipeline to speed up processing a Python dataset using all CPU cores
  • Storing processed Python projects in JSON-formatted files.
  • Excluding duplicate files of a dataset from processing.
  • Add file set (train/test/validation) to processed project if given.
  • Applying standard NLP operations on identifies in a module.
  • Excluding cached projects before running the pipeline if specified.
  • Throwing NullProjectException for projects that have no source code files.

AST-based Extractor

  • Creating a normalized Seq2Seq representation of a source code file aligned with a sequence of identifiers' type.
  • Extracting import names of a module.
  • Extracting the name of global variables in a module with their type annotations (if present).
  • Calculating type annotation coverage for the whole project and its source code files.
  • Extracting the name of classes in a module.
  • Extracting the name of class variables and their type annotation (if present).
  • Extracting the name of functions in a module or in a class.
  • Extracting the name of functions' parameters and their type annotations (if present).
  • Extracting return expressions in functions.
  • Extracting the occurrence of a function's parameters in the function's body.
  • Extracting the return type of functions (if present).
  • Extracting docstring for functions' parameters and their return type.
  • Extracting short and long descriptions of functions in their docstring.

AST-based Transformers

  • Adding space around source code tokens for better tokenization.
  • Removing comment and docstring from source code for its normalized Seq2Seq representation.
  • Removing string literals from source code for its normalized Seq2Seq representation.
  • Removing numeric literals from source code for its normalized Seq2Seq representation.
  • Removing type annotations from source code for its normalized Seq2Seq representation.
  • Propagating the type of functions' parameters in the function body and module-level constants

Fixed

  • A special case where uninitialized variables with types caused exceptions.
  • A case where variables in a tuple couldn't be extracted in multiple assignments.
  • Handling nested tuples in multiple assignments for extracting var names.
  • A case where a type-annotated class attribute is not initialized for removing its type