Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directories listing returns results in a nontraditional Alphabetical order - ALL CAPS precede mixed case or lower case #121

Open
SnortsAlot opened this issue Nov 27, 2020 · 4 comments

Comments

@SnortsAlot
Copy link

Returning the results from directory listing.

I'm guessing the ascii values are being checked for that so while a listing in more traditional alphabetical ordering would look like

Fan
FIRST
fumble
Fuzzy

the index returns

FIRST
Fan
Fuzzy
fumble

apparently prioritizing uppercase in resulting listings.

if this is intended, please turn this into a feature request to allow returned results with effectively a .lower() scenario and... then how other language's special characters are handled in that ordering,

I've not done extensive testing, but it appears non native "English" characters are displayed/ordered after z in most cases.

ie..
a,b,c...
z
Then all other characters or diacritical markings (umlaut, cedilla, accute accent, crucflex, tilde, grave, etc)

à, è, ì, ò, ù - À, È, Ì, Ò, Ù
á, é, í, ó, ú, ý - Á, É, Í, Ó, Ú, Ý
ą,ł, ż, ß, ä, ö, ü, ç, ã, õ,

armadillo
monkey
zebra
àlex

versus the more expected

àlex
armadillo
monkey
zebra

@SnortsAlot SnortsAlot changed the title Directories listing returns results in a nontraditional Alphabetical order - ALL CAPS precede mixed case or al lower case Directories listing returns results in a nontraditional Alphabetical order - ALL CAPS precede mixed case or lower case Nov 27, 2020
@ryandesign
Copy link
Contributor

Right, it sorts names using ngx_strcmp (a wrapper around the standard strcmp), just like nginx's built-in autoindex module does. It's fairly common for web servers (and UNIX systems generally) to sort directory indexes in this manner.

To perform a case-insensitive sort like you suggest, it would have to use strcasecmp. Sorting which is aware of non-ASCII characters is more complex. Sorting rules (collations) can even change depending on the locale. That would mean that a locale-aware sort function like strcoll would need to be used, and it would need to be possible to configure the server as to what locale to use.

I don't see any wrappers around strcasecmp or strcoll in nginx's utility API documentation. Perhaps you could suggest to the developers of nginx that they add the ability to do case-insensitive sorting and/or locale-aware sorting to their autoindex module. Perhaps in the process of adding that feature, they will add ngx_strcasecmp and ngx_strcoll functions which ngx_fancyindex could then use to implement the same feature.

@ryandesign
Copy link
Contributor

Locale-aware sorting has been mentioned previously in #60.

@aperezdc
Copy link
Owner

One potential can of worms of using locale-aware collation is that we don't know which locale should be used:

  • The locale configured in the server may not be what the user of a website expects. Not to mention that typically services in servers are configured with a language-agnostic locale like C or C.UTF-8, which would still result in letters with diacriticals sorted out in in unexpected way for some people.
  • There is no reliable way to knowing what the user really expects. Browsers send the Accept-Language header, but many users (most, even?) never care to set their preferred language options in their browser preferences. Some sites try to guess based on the location of the client's IP address and this results in terrible user experience (I hate Google for doing this kind of thing): if I am on a trip to Romania I want to keep seeing pages in English because I don't understand Romanian; or I could be accessing some site through a VPN or a proxy.

Using plain ASCII sorting (what strcmp and ngx_strcmp do) is the only reasonable option. I think we can consider adding case-insensitive sorting, though. Maybe even switching to case-insensitive sorting by default 🤔

@ryandesign
Copy link
Contributor

I could see a use case where someone runs a server serving files whose names are primarily in one language and wanting the sort order to reflect that language. Think about an internal server at a small company serving files only for the employees of that company.

Allowing the web site visitor to influence the locale of the sort order is probably beyond the scope of what a web server module could be expected to do. Allowing the returned content to vary based on a header is bad for caching too.

Allowing the server administrator to select case-insensitive sorting (#78, #124) is great, but again it's probably out of scope to allow the web site visitor to select that, and to remain consistent with what Apache and nginx server administrators expect I would recommend keeping case-sensitive as the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants