Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique identifier #54

Open
jracusin opened this issue Mar 14, 2023 · 17 comments
Open

Unique identifier #54

jracusin opened this issue Mar 14, 2023 · 17 comments
Assignees

Comments

@jracusin
Copy link
Contributor

In order to be able to cite notice messages individually, we need to add a unique identifier for each message in the producer class. This could include a unique string for each producer than a alphanumeric string for each message.

@lpsinger
Copy link
Member

That could be the record offset.

@Tohuvavohu
Copy link
Contributor

Is the record offset something set explicitly in the Unified Schema, and assigned by the producer?
Or is assigned by GCN?
How does one find the ID?

@lpsinger
Copy link
Member

lpsinger commented Apr 2, 2023

A Kafka record is uniquely identified by its topic, partition, and offset. With those three pieces of information, you can command a client to seek to the given record. I would suggest developing some notation combining those three fields.

@Tohuvavohu
Copy link
Contributor

Thanks Leo.
But the record topic, partition, and offset is not known before submission by the producer, right?
So how does one know what the partition and offset are? I have to listen to my own notices and record?

@lpsinger
Copy link
Member

lpsinger commented Apr 3, 2023

The topic is certainly known before submission. As for the partition, all of our topics currently use a single partition, although that might not always be the case.

I would think that there is probably a way for a producer to get the offsets of records it has sent shortly after they are flushed.

Why does your producer need to know the offsets of records it has sent?

@Tohuvavohu
Copy link
Contributor

Need to know the notice ID for reference purposes.
For example, in our retraction notice type, it would be useful to be able to reference the ID of the notice we are retracting. etc.
If you are suggesting that this notice ID be made up partly, or entirely, of the record offset....then need to know this.
Happy to avoid, if you think unnecessary.
But it seems generally useful to be able to reference a particular notice directly.

@lpsinger
Copy link
Member

lpsinger commented Apr 3, 2023

Kafka records can also have keys.

I suggest that you study the Streams Concepts page, particularly the parts on keys, partitions, and timestamps.

@Tohuvavohu
Copy link
Contributor

@lpsinger @jracusin
i don't care what GCN uses as the unique notice identifier, as long as it exists.
You suggested using the offset. I'm also open to keys, I don't really care, as long as it is clear how to work with it.
Let me know when you have chosen a method.

@lpsinger
Copy link
Member

lpsinger commented Apr 3, 2023

We do not yet have a design for a unique notice identifier. I am just leaving this as background reading.

@lpsinger lpsinger transferred this issue from nasa-gcn/gcn-kafka-python Apr 18, 2023
@dakota002
Copy link
Contributor

@Tohuvavohu a quick update on this: I made a PR for the gcn.nasa.gov that updates the sample code to print out the offset number.

If you want to print it in a consumer, you can add print(f'{message.topic()}: #{message.offset()}') to the consuming loop.
You can use the offset and topic to directly reference specific Notices. Here is an example of retrieving a gcn.classic.text.SWIFT_ACTUAL_POINTDIR notice using the specific offset number: 33893 (I got this number yesterday using the message.offset() example). The python gcn-kafka library is a wrapper around confluent_kafka, so you should already have the package installed

from gcn_kafka import Consumer
from confluent_kafka import cimpl

# Connect as a consumer.
# Warning: don't share the client secret with others.
consumer = Consumer(client_id='your-client-id',
                    client_secret='your-client-secret')


topic = "gcn.classic.text.SWIFT_ACTUAL_POINTDIR"
pt = cimpl.TopicPartition(topic, 0, 33893)
consumer.commit(offsets=[pt])
consumer.subscribe([topic])

for message in consumer.consume(num_messages=1):
    value = message.value()
    print(f'{message.topic()}: {message.offset()}')
    print(value)

@Tohuvavohu
Copy link
Contributor

Thanks @dakota002 ! :)
Definitely should be in the documentation for users (i see that in your PR now)

@Tohuvavohu
Copy link
Contributor

Tohuvavohu commented May 19, 2023

Do you think it makes sense to append the ID number to the alert packet itself?
Then people who save the alert.json on receipt will be able to reference the ID.
Don't know if this is feasible given how the offset number is assigned.
Thoughts?

@dakota002
Copy link
Contributor

I would say it technically is. If you consider the alert packet as the whole message object, and keep in mind that the JSON is just the value . I definitely agree though that this info should be more visible in our documentation. Here is some more information on the Message class

@Tohuvavohu
Copy link
Contributor

Right, but can it be appended to the value object itself? I imagine many will want to save the value of the alert to a json file, and being able to consistently reference and find the ID would be very useful for users, I think.

@lpsinger
Copy link
Member

You don't really know the offset until the Kafka broker has ingested the record. So that's not really possible.

@Tohuvavohu
Copy link
Contributor

figured that might be the case, thanks.

@blaufuss
Copy link
Contributor

Just to clarify: this is what's intended to be included when one uses the "reference" the gcn-schema/core/FollowUp schema, like:

"reference": { "gcn.notices.LVK.alert": 6666 },

Additionally, confirming there's no ability to access this if you're using older GCN interfaces, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants