Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing? #72

Open
ianbstewart opened this issue Sep 4, 2019 · 4 comments
Open

multiprocessing? #72

ianbstewart opened this issue Sep 4, 2019 · 4 comments

Comments

@ianbstewart
Copy link

ianbstewart commented Sep 4, 2019

Hello,

It seems that langid uses multiprocessing under the hood to make classification faster. Is there any way in Python to force langid to use a single process (turn off multiprocessing)?

@ianbstewart
Copy link
Author

This problem still persists, even with a simple script like the one below. Please advise on how to turn off multiprocessing.

import numpy as np
np.random.seed(123)

lang_id_model = LanguageIdentifier.from_modelstring(model, norm_probs=True)
# generate random text
alpha = list(string.ascii_lowercase)
N = 100000
M = 40
# generate text of fixed length
txt = [''.join(np.random.choice(alpha, M, replace=True)) for _ in range(N)]

# tag text
lang = []
for txt_i in txt:
    lang_i = lang_id_model.classify(txt_i)
    lang.append(lang_i)
print(len(lang))```

@ianbstewart
Copy link
Author

ianbstewart commented Apr 23, 2020

Update: I forced langid to use fewer processes by setting a global variable before running the python script.

MAX_CPU_USE=20
export OMP_NUM_THREADS=$MAX_CPU_USE

@goors
Copy link

goors commented Dec 9, 2023

This lib is just taking so much cpu, 8 cores out of 8 cores. I mean do not get me wrong, what you did is impressive, and I do not mean to be a critic or anything but taking up 8 cores out of 8 cores.

@goors
Copy link

goors commented Dec 9, 2023

Update: I forced langid to use fewer processes by setting a global variable before running the python script.

MAX_CPU_USE=20 export OMP_NUM_THREADS=$MAX_CPU_USE

Is this going to slow classification down? What is the impact here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants