Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: two minor fixes #147

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

chore: two minor fixes #147

wants to merge 2 commits into from

Conversation

WindRunnerMax
Copy link

两处小修正

  1. 去掉了READMEpython tokenize_dataset_rows.py参数的多余空格,多余空格使转义符转义到空格上了。
  2. cover_alpaca2jsonl.pyjson.dumps时,中文字符会被转义,生成的jsonl文件可读性略差,当然json.loads会转义回来不影响功能。
>>> import json
>>> print(json.dumps({ "intro": "测试"}))
{"intro": "\u6d4b\u8bd5"}
>>> print(json.dumps({ "intro": "测试" }, ensure_ascii=False))
{"intro": "测试"}
>>> print(json.loads('{"intro": "\u6d4b\u8bd5"}'))
{'intro': '测试'}
>>> print(json.loads('{"intro": "测试"}'))
{'intro': '测试'}

@WindRunnerMax WindRunnerMax closed this by deleting the head repository Apr 9, 2023
@WindRunnerMax WindRunnerMax reopened this Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant