-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use standardized language identifiers for lbx files #160
Comments
General instructions can be found at the old SF wiki for biblatex (edit that file is now in an updated version on the GitHub wiki, please use: https://github.com/plk/biblatex/wiki/Checklist-for-submitting-a-new-localisation-file-(.lbx)). For testing see example Languages based on the the Latin alphabet should be encoded in Ascii. That way they will be supported by any backend (BibTeX variants and biber). Regarding your other questions:
|
Audrey, Thanks for the quick answer. I'll build the framework that takes to produce some of the several lbx files that will be necessary to really i18n Biblatex. The problem of working with a single lbx file at a time is that you can't compare to nearby languages or change your mind later about a better translation - because after you written a token, the original is gone. I'll read in all existing lbx files and the produce the database that will be needed to drive the manufacturing of the new ones. There are 45 languages in Babel which are not in Biblatex, so it will take an organized effort of more than just programmers to get there. I understand the problems of keeping track of changes (via GitHub) and testing are serious, but I'll produce the files and send them ready to you. Supporting all country* strings is fairly easy! I already have most country names in some 200 languages, so I'll produce the files and make them available to you, if you want to use... but I would strongly recommend the use of a separate file for that - so as not to overload the lbx's. Before I get started I have one small question. You mention the use of ".\isdot" on the Wiki page, but I do not see any occurrence of that on any of the lbx's. Is that really necessary at this level? Thanks! |
I'm for doing this if we can have a way for the releaser(s) (which is currently me) to generate all current .lbx files for a release on demand. I would prefer something like a db and a pull interface in Perl (biber is all in perl ...) which generates the .lbx. If the db was something like SQLlite, the db could also be in the biblatex git repo. The problem then is that contributors would either still send diffs against the generated .lbx or would need to look at the db, which is probably out of the question for most people. Text files are easier for this but, as you say, we don't get tehe coverage or consistency we need in future. |
Philipp Lehman wrote the wiki page. I'm not sure what use-case he had in mind for About overloading the lbx files or separate files for country-specific strings - this is what I meant by "unwieldy". Certain aspects of the core biblatex styles are demonstrative rather than exhaustive. This is one good example. Users can easily extend the lbx files. If you're wanting to share all those extra strings with others, consider an add-on package. The DB/spreadsheet could be maintained similar to the localization keys document - just an extra resource, but not necessary for contributing lbx files. Note that only a fraction of the current lbx files are actually complete, so between-language comparisons are limited. |
It will take some building, but I think it is the only way to go! Imagine making a structural change and have to change/test 45 lbx files! I also want to build an interface so one can choose 3 to 4 languages to compare/edit the DB - as the coverage get bigger that will be more important. Changes to lbx's made by users or entered directly on GiHub should integrate easily with the DB back... because those will continue to happen! I'll take a detour and come back when I am able to generate all the current 20 essential lbx files exactly the way they are right now. PN |
I am almost done with the back-end to produce the lbx files from a DB. I can produce lbx files that are almost identical to the existing ones and get some 50 more languages in the fray.... the problem here will be to get Babel to do the same thing ...but at this point I have an important question: Why are we using a separate i18n LBX set of files, if we could use the ones from the CSL project at
In my (uninformed) way to view it, there are plenty of reasons to use it instead of the lbx's:
Paulo Ney |
I like the idea of using standards like this but there are some things to consider though:
|
Answering each of your questions/comments:
PN |
Well we could consider the CSL route later if they were more to our needs but currently, they're not really. I had this argument with the "generic bib system" people a few years ago - they didn't seem to understand that high-quality bib typesetting needs semantic integration into the typesetting - there is no good "generic" solution ... |
I can produce identical lbx's already. When they differ, it is because the original lbx's have something wrong - a space out of place, etc ... I am using MySQL because at the moment is what I have in one particular server that I am interacting with someone lese on the project, but writing very generic code that could be changed to anything. I would like to add that one more advantage of doing this via the DB, is that you then can interface with people all over, which are interested in i18n of biblatex. They would just need to enter the data in a interface and their lbx files could be exported and later included in the distribution. |
Ok - what language are you using for data extraction and creation of .lbxs? |
Perl. |
Good. Biber is all in perl too. Perhaps you could send me a MySQL dump and the perl? I'd like to have a look at it. |
Sure! Give me sometime to wrap it up ... I am sorting the issues with translations in to languages that have "gender" right now (so I can parse in the XML) and sort a few other edges and send you the stuff. It is just one script. |
No rush, many thanks. We'd then have to think about hosting this in some way or perhaps using SQL lite and keeping just a db file in the git repository etc. |
One thing I realized today writing the maps to parse the XML files of CSL, is that they have a nice way to recognize the gender and number (singular or plural) of words in other languages that is NOT present in the lbx file structure! To translate a phrase like Translated and Annotated by ... to languages like Portuguese and Spanish requires one to know the gender of the entity being translated and annotated. If it is a book or a an Album will be masculine, but it if is is a Collection or a Thesis it will be feminine. So I don't really see how this could be done in the realm of the current lbx's files. Would someone mind sharing the wisdom on how these problems with be dealt with ? PN |
@aboruvka - do you have a comment on this? |
Gender specific strings come up with
Some languages use masculine or feminine ordinals depending on the gender of item being indexed (e.g. series or edition). These are handled on the translator's end with the bibliography "extras" questions I mentioned earlier. For the "by" roles, you could simply add gender/number-specific variants provided that the gender/number of the work is strongly tied to the entrytype (e.g. The same problem has been mentioned in #48 for non-"by" roles, where the gender/number would be specific to the people filling the role. The strings already consider number because this is available in name list processing. Gender would have to be indicated explicitly in the entry somehow. |
Thanks! That should do it. |
Not quite. There is work on our end to be done. The bibliography extras questions would also need expanding to ask about the gender and number of I'm saying it is probably do-able, but we have to consider work required to get this done, the relative demand for the new feature, and potential issues the feature might open up. If PL knew about this limitation and decided not to implement it, he likely had a very good reason. |
PLK, Audrey, I am down to the wire, and about to start the last upload to the db and the last series of tests. Should I grab a set of fresh lbx files from the development branch ? Or use the last public release? |
Always grab from DEV - it's more up to date ... |
One of the hardest things I had to deal with in this side project was the fact that "language" and "locale" are mixed inside BibLatex in some unreasonable ways. It is true that most of what in inherits (or uses) from Babel is in the form of language, but the LBX files contain so much about "locale" that is impossible to do it all in the realm of language only. When one say that an entry should have "hyphenation = {portuguese}" that is all good and okay, but the entry: language = {portuguese} should never be expected format an entry properly because Iran, Bahamas, Kazakhstan, ... are written in one way in pt_PT and in another way in pt_BR. In order to circumvent my difficulties introducing the translated terms in a DB and importing some new ones I had to literally introduce locales in my table of languages and vice-versa... something a programmer should never have todo! Now that internationalization is really coming, in order to manage this well and be able to expand in the realm of languages that have many many locales it would be nicer to split this two roles well. I know that, for Portuguese alone there is a portuguese.lbx, portuges.lbx, brazil.lbx and brazilian.lbx - but it is extremely hard to maintain in the way it is laid out, eliminate duplicate and deal with inconsistencies. One should have a unique file "portuguese.lbx" and a couple additional pt-BR.lbx and pt-PT.lbx that should call the main one and define some small local components. Labeling of language and locale should follow standards (ISO and IETF) so one can interchange with other Bibliography management software and compatibility with the name space of Babel should be an internal issue and the user should never have to deal with that at a bibliography entry level. Just my 2cents! Paulo Ney |
With the 2.8 DEV branch, I'm moving away from the |
Lines 461-462 of the english.lbx file have a curious entry: countryeu = {{European Union}{EU}}, can anyone tell me what the second line means ? Paulo Ney |
I should have said that I saw this: \keyitem{countryeu} The name , abbreviated as \vrb{EU}. in the examples, but I continue puzzled by the meaning of it... Paulo Ney |
Good question - @aboruvka - any idea? It looks to me like a copy-paste which should read:
? |
No idea. I don't think it is a mistake, though, because then |
On top of that, the set of files:
should go through the same factorization we did in the other files. The differences between the 3 files above are minuscule and factorization of the keys will greatly simplify things. In this set:
on top of the differences been also minuscule, there are also redundant keys wit the files they input. If you could rename then and I could do the clean-up! Paulo Ney |
Philip, Why do the APA definition files have to contain a string like "january" all over again ?
even though they are already defined in the main file ... I know that in the main files most of them (except for Finish and Croatian) use abbreviation for the short form of the string and in the APA files all of them are spelled out ... but I thought it was easy to write a style that used the full term and not the abbreviation. Paulo Ney |
It was a long time ago but I think it was because at the time there wasn't any other way to force a non-abbreviated month form ... |
You can probably change the code to use the normal standard month names in If you do that, you can leave the LBX with me and I'll clean them up from Paulo Ney On Mon, Jul 14, 2014 at 4:46 AM, plk [email protected] wrote:
|
Ok, I have used |
Great! Where can I get distribution? Paulo Ney
|
It's also on github |
I'll have the APA file ready soon. For the Bulgarian, please put him in touch with me. At this stage I Paulo Ney On Tue, Jul 15, 2014 at 12:54 AM, plk [email protected] wrote:
|
Thank - you may already know but there is a checklist we recommend for people adding translations so that the various options can be set correctly in the .lbx. We should probably make this more official ... https://sourceforge.net/p/biblatex/oldwiki/Adding_lbx_Files/ |
Yes! I know that and will direct him to it! He is the maintainer of Babel, Thnks, On Tue, Jul 15, 2014 at 4:06 AM, plk [email protected] wrote:
|
Philip, the new APA file are at: https://drive.google.com/file/d/0B3mOBzjP3W1ndTBPdUlVRWlvdzg/edit?usp=sharing They are factored among themselves (and the total size has reduced quite a I hope with my newly acquired skills that they will work straight from the Paulo Ney On Tue, Jul 15, 2014 at 4:11 AM, Paulo Ney de Souza [email protected]
|
Philip, We interact quite a bit and he finished with the file - which is is one of I don know how much detail you want on the situation that created this, but So the current structure of the file right now is: \lbx@ifutfinput The guy that wrote the LBX file (Grigori) is the maintainer of the Babel Following BCP47 the best way to handle this would be to have two sets of bg-BG.lbx that would naturally use UTF-8, and another set: bg-BG-ASCII.lbx that would use the \cyrxx input. What are your thoughts on this ? Paulo Ney On Tue, Jul 15, 2014 at 12:54 AM, plk [email protected] wrote:
|
This sounds reasonable. To get all this to work, I still need to address the three things above from a couple of weeks ago however. Did you speak to the babel/polyglossia maintainers about potentially supporting BCP47 lang specifiers? In the long term, this will be necessary. |
Cool! I am constantly talking to Javier Bezos at Babel and I have - in the last I am speaking on TUG next week on the subject and if you have the And I have one more question for you: In the new set-up, what are the steps Paulo Ney On Mon, Jul 21, 2014 at 4:19 AM, plk [email protected] wrote:
|
Adding a new language already supported by babel/polyglossia is usually just a matter of having a new .lbx file. |
I started creating a lbx file for the Hungarian language, see https://bitbucket.org/marczellm/latex-magyar-contrib/src. It's incomplete. A main reason why it's incomplete is me being only a BSc student, therefore I don't know the Hungarian equivalent for a lot of terms without seeing them in context. In this I would appreciate instructions on how to test my lbx file so that all bibstrings appear in the PDF and I can see them in context, that would help a lot. Also a lot of extra code seems to be needed because of the nature of the Hungarian language, best demonstrated by some examples:
All in all I'd really appreciate some advice on how to complete my lbx file to get it added to official BibLaTeX. |
Hi @plk, |
I haven't received one for inclusion yet but if you have a |
@mvassilev said he will double-check the translations once again and we could attach it here. Does that make sense? |
Yes, fine. |
Are you interested in updated translation of biblate to Finnish? If you are, where do I send it and how? |
Absolutely, Please send it to me. You can send a unified diff if you like. PK Dr P Kime On 04 Jul 2015, at 18:01, ahomansikka [email protected] wrote: Are you interested in updated translation of biblate to Finnish? If you — |
Here it is! --- /usr/local/texlive/2014/texmf-dist/tex/latex/biblatex/lbx/finnish.lbx 2013-10-28 00:52:35.000000000 +0200
L�hett�j�: plk [[email protected]] Absolutely, Please send it to me. You can send a unified diff if you like. PK Dr P Kime On 04 Jul 2015, at 18:01, ahomansikka [email protected] wrote: Are you interested in updated translation of biblate to Finnish? If you � � |
This (finnish update) has been committed to the DEV branch, thanks. |
Is there are "template" one can use to make the translations to be used in language.lbx? Or should that be done on top of one of the existing files?
I would like to create the files for Romanian, Vietnamese, Chinese and Japanese and I do have people in the office which are capable of making the translations and have experience with Bibliographies, but NONE of them are programmers.
Also: Is there a guide on how to add a new language support ? Even though it is easy to understand what goes on inside \DeclareBibliographyStrings{ }, I would like to know when is preferable to use tex-encoding as supposed to utf8, for example?
Other questions are:
1- Can one add support for a language that is not supported by Babel?
2- When do one use \adddot and when does one use \adddotspace ?
3- Why country support (within language.lbx) is limited to Germany, EU, US, France and GB ?
4- Are you using a framework to do this? In general it is easier to manage them in a single spreadsheet with the translations to each language in each column and a script that reads the column and writes the LBX files! The translators can then easily compare to "nearby" languages and easily make other translations.
Is work by others on this kind of issue welcomed ?
Thanks for the great package!
Paulo Ney
The text was updated successfully, but these errors were encountered: