You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[background]
I want to use my own text-image datasets to generate binary format dataset for CogView training in 'https://github.com/THUDM/CogView'. It has been mentioned in that repo the author use this cogdate toolkit to preprocess data.
[question]
Would you please tell me how to organize my raw text-image dataset, and then how to use the cogdata toolkit to generate the target bin file? for example, whether i should name the a text-image pair the same, such as 'a dog sits on the ground.txt' and 'a dog sits on the ground.png', or i should take other forms?
The text was updated successfully, but these errors were encountered:
@xiaocaijizzz
More detailed documentation is here : 'https://sleepychord.github.io/cogdata/build/html/index.html'.
For example, you can use '--data_format TarDataset --data_files path_to_your_tar', or '--data_format ZipDataset --data_files path_to_your_zip' while creating dataset.
Images in zip are like '1.jpg, 2.jpg .....'
Then I recommend you use '--text_format dict --text_files path_to_your_txt'.
Text files are like : "{'1':'a dog sits on the ground', '2':'cat', ....}"
[background]
I want to use my own text-image datasets to generate binary format dataset for CogView training in 'https://github.com/THUDM/CogView'. It has been mentioned in that repo the author use this cogdate toolkit to preprocess data.
[question]
Would you please tell me how to organize my raw text-image dataset, and then how to use the cogdata toolkit to generate the target bin file? for example, whether i should name the a text-image pair the same, such as 'a dog sits on the ground.txt' and 'a dog sits on the ground.png', or i should take other forms?
The text was updated successfully, but these errors were encountered: