Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message for v2-only torrents #300

Open
josecelano opened this issue Sep 19, 2023 · 8 comments
Open

Improve error message for v2-only torrents #300

josecelano opened this issue Sep 19, 2023 · 8 comments
Labels
Easy Good for Newcomers Enhancement / Feature Request Something New
Milestone

Comments

@josecelano
Copy link
Member

If you upload a torrent with only the field for version 2, you get a generic "Internal Server Error".

We should show a message like "Invalid torrent. BitTorrent version 2 is not supported."

@josecelano josecelano added Enhancement / Feature Request Something New Easy Good for Newcomers labels Sep 19, 2023
@mario-nt mario-nt self-assigned this Nov 10, 2023
@josecelano
Copy link
Member Author

I did some work related to parsing the torrent file in this repo:

https://github.com/torrust/torrust-parse-torrent

One of the things I wanted to test was a more verbose parser:

https://github.com/torrust/torrust-parse-torrent/blob/main/src/utils/parse_torrent_verbose.rs

The idea is to get an exact error message is one mandatory field is missing.

@josecelano
Copy link
Member Author

Anyway, I opened this issue becuase I wanted to show a different message if the user is trying to upload an only-V2 torrent (not hybrid).

@mario-nt
Copy link
Contributor

I think we can have two issues, one for the v2 torrents and other for the hybrids (in case we want to support them, as they don't seem to be very popular, both v2 and hybrids).

An important feat is to give more concrete errors on mandatory fields.

@mario-nt
Copy link
Contributor

mario-nt commented Nov 15, 2023

When the user uploads a torrent we have three different scenarios:

The torrent file is a V1 torrent
The torrent file is a V2 torrent
The torrent file is a Hybrid torrent

Right now, we only support V1 torrents, and we create an info hash which can be different from the original info hash as we only take into account some of the fields (mandatory fields), discarding the rest.

If the user tries to upload a non valid torrent, a generic error message appears, but it would be a good idea to show more concrete error messages.

To do that, we need to check if the torrent is a V1 | V2 | Hybrid and then check for mandatory fields.

First, I want to work on creating concrete error messages for the V2 torrents, as they are more popular than hybrids (even though both are not widely used).

Then work on creating concrete messages for hybrid torrents.

And lastly, work on concrete messages for mandatory fields (relates to https://github.com/torrust/torrust-parse-torrent that Jose made).

For the V2 torrents, I want to propose two ways of implementing the logic to the team before writing the code.

I came up with these solutions:

  1. Creating a new struct for V2 torrents with the mandatory fields, and after deserialize it, checking if the info dictionary field "meta version" has a value of 2, because V2 torrents have that field with a value of 2.

  2. Checking the length of the info hash of the torrent file, and also checking if the torrent file has one or two info hashes, as hybrid torrents have two info hashes, one for the V1 torrent (SHA-1, 32-40 Hexadecimal characters)
    and another for the V2 torrent (SHA2/256 with 64 hexadecimal characters).

The Bittorrent client QBitTorrent uses the info hash for differentiate V1 or V2 torrents:

https://github.com/qbittorrent/qBittorrent/blob/298e4ba852a7764dc4e5783878f51b93581937a4/src/base/bittorrent/torrentdescriptor.cpp#L49-L88

namespace
{
    // BEP9 Extension for Peers to Send Metadata Files


    bool isV1Hash(const QString &string)
    {
        // There are 2 representations for BitTorrent v1 info hash:
        // 1. 40 chars hex-encoded string
        //      == 20 (SHA-1 length in bytes) * 2 (each byte maps to 2 hex characters)
        // 2. 32 chars Base32 encoded string
        //      == 20 (SHA-1 length in bytes) * 1.6 (the efficiency of Base32 encoding)
        const int V1_HEX_SIZE = SHA1Hash::length() * 2;
        const int V1_BASE32_SIZE = SHA1Hash::length() * 1.6;


        return ((((string.size() == V1_HEX_SIZE)) && !string.contains(QRegularExpression(u"[^0-9A-Fa-f]"_s)))
                || ((string.size() == V1_BASE32_SIZE) && !string.contains(QRegularExpression(u"[^2-7A-Za-z]"_s))));
    }


    bool isV2Hash(const QString &string)
    {
        // There are 1 representation for BitTorrent v2 info hash:
        // 1. 64 chars hex-encoded string
        //      == 32 (SHA-2 256 length in bytes) * 2 (each byte maps to 2 hex characters)
        const int V2_HEX_SIZE = SHA256Hash::length() * 2;


        return (string.size() == V2_HEX_SIZE) && !string.contains(QRegularExpression(u"[^0-9A-Fa-f]"_s));
    }


    lt::load_torrent_limits loadTorrentLimits()
    {
        const auto *pref = Preferences::instance();


        lt::load_torrent_limits limits;
        limits.max_buffer_size = static_cast<int>(pref->getTorrentFileSizeLimit());
        limits.max_decode_depth = pref->getBdecodeDepthLimit();
        limits.max_decode_tokens = pref->getBdecodeTokenLimit();


        return limits;
    }
}

I think the second option is more appropriate, and it might even be faster as the index only has to check if there is one or two info hashes, and then the length of the info hash. While the first approach, requires deserializing all the torrent fields. But this is only an assumption and benchmarks might be needed, and performance wise both approaches should not have a big hit in the overall performance of the index.

@mario-nt
Copy link
Contributor

What is your opinion on the above? @da2ce7 @WarmBeer @josecelano

@josecelano
Copy link
Member Author

What is your opinion on the above? @da2ce7 @WarmBeer @josecelano

Hi @mario-nt, correct me if I'm wrong:

  1. In all of them (v1, v2 and hybrid) you calculate the info-hash by hashing the whole info key, but using a different hash function.
  2. We need to know first if the torrent is v2 or hybrid to calculate the info-hash with the SHA-256 function. We first check the content and then we calculate SHA-1 for v1 and SHA256 for v2 and hybrid torrents. So even if you use the hash to know the version, so need to know first the version.

Other considerations

I would like to avoid making the upload even slower adding an extra parsing or hash function if possible. Although I'm not especially worried about performance for this endpoint. But maybe I should. We have been testing with small torrents, for big torrent files parsing twice can take too long.

So I would prefer not to parse the torrent with a V2 struct just to know if it fits into a V2 torrent.

Alternative approach 1

We are already parsing the torrent twice:

  1. First time to calculate the actual info-hash (SHA-1) because we want to include all fields in the info key.-
  2. Secondly to actually map the torrent file into the Rust Torrent struct.

This is just an idea.

The function to calculate the actual info-hash actually parses the whole file into this struct:

struct ParsedInfoDictFromMetainfoFile {
    pub info: Value,
}

Because we only want the info key. If the info key contains the meta version we can assume that the torrent is either a v1 or hybrid torrent.

The calculate_info_hash could be renamed to preparse_torrent and return both:

  • The actual info-hash
  • An enum with two values: v1 and v2-or-hybrid.
  • Optionally, we could return the SHA-256 if the torrent is v2 or hybrid (if we need it).

I think we only need to take the info: Value and extract the meta version from it. So that we do not need to parse again the torrent file.

Finally in the decode_and_validate_torrent_file, when we parse the torrent to build the Torrent strcut if we have an error we can return a new error:

  • DecodeTorrentFileError::InvalidBencodeDataV1
  • DecodeTorrentFileError::InvalidBencodeDataV2OrHybrid

With a new error, we can change the error messages to something like:

  • "Invalid torrent file. Check for missing mandatory fields"
  • "Invalid v2 or hybrid torrent file. Check for missing mandatory fields in v1. We do not support v2-only torrents"

Alternative approach 2

On the other hand, I think the "Alternative approach 1" is too complex. I would implement the straight one (try to parse both TorrentV1 and Torrent V2 structs) even if te performance is worse. Maybe it's just a couple of milliseconds more and it does not worth it adding that complexity.

Alternative approach 3

We create a new strcut abstraction GenericTorrent where we include all the possible fields in a torrent file: v1, v2, non-standard fields, etcetra. We only parse this once, and them we try to build an v1 torrent from this generic strcut. If we cannot build the torrent v1 then we can check without parsing again if the torrent is v2. In fact, this solution can also be used to show more concrete messages when a mandatory field is missing. Maybe it's even a better option than the one I mentioned avobe.

Besides, when we add the log probably we are going to need something like this to extract all the info from the original torrent file.

Conclusion

I would explore alternative approach 3 a little bit. If it not feasible for some reason I can't see now I would go for parsing again for TorrentV2 strcut.

@josecelano
Copy link
Member Author

Article about BitTorrent V2: https://blog.libtorrent.org/2020/09/bittorrent-v2/

@cgbosse cgbosse added this to the v3.0.0-beta milestone Jan 12, 2024
@josecelano josecelano modified the milestones: v3.0.0, v3.1.0 Mar 7, 2024
@josecelano
Copy link
Member Author

I'm moving this to the next big milestone because it's not clear to me if we should support V2 torrents. It looks like other projects are even removing support for V2 torrents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Good for Newcomers Enhancement / Feature Request Something New
Projects
Status: In Progress
Development

No branches or pull requests

4 participants