-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] word precision #106
Comments
Hi @irux, getting the timestamps per word is currently not supported and the endpoint we are currently using does not provide this information. However, there probably is some way to access that information (since it is used by the YouTube web-client) I don't know about yet. Do you mind sharing what your use-case is for this? Just so I could get a better idea of how important this feature is for this module in general. |
@jdepoix my use case is for a video editing tool. I am doing ASR at the moment with AWS, but with this feature I would be able to avoid more api calls to other services and it would be free. Btw, I already found the endpoint. I will need to test more things but I think I would be able to make a pull request in the following days. |
Nice! 👍 |
It is actually quite simple. You only need to append |
@jdepoix I am seeing that it behaves like it should be. It actually would save some work because the format is already in json and you don't need to convert it from xml |
Sweet, that looks great! I agree that this could replace the xml request and the parsing which goes along with using that. This might be the time to make |
We would probably also have to rename a few classes for the naming to make sense in that case:
This would "give room" for a new |
@irux you'll also have to test this very extensively, as changing this could potentially completely break this module.
|
I was doing some research and I found a new thing that can help you with the webttv format problem. if you use fmt=vtt you are going to get it as vtt EDIT: You actually have:
as possibilities (That's what I could find to the date) |
@jdepoix I haven't fully dug into everything yall said yet but tinkering with that URL provided above (No longer working). This seems to be some kind of undocumented API that does require us to explore options to see what does/doesn't work. Looks like there are some StackOverflow posts of folks talking about this undocumented API. It actually looks like it does quite a bit of work for us. Two useful example URLs that respond defaulting to XML. Transcript list of supported Langs Remaining Thoughts
So with that being said, I feel like it would almost be better to keep the current API as it is as some kind of fallback in case this undocumented API disappears. So you don't create a heavy reliance on it. Maybe we could introduce a "contrib"-like subpackage within I am just throwing initial thoughts here I am not entirely certain at the moment. @jdepoix Whatever you feel is the best way to go I will try to help contribute to help make that happen. 🙂 |
@crhowell it's normal that the link expires after a certain time, that and the concerns you raised about it being undocumented is also true about the endpoint we are currently using. In fact, it is the same endpoint only with the addition of the The problem with the simple With the API providing so many formatting options I am starting to think that we should maybe rework how the formatters work. I will have to think that through a bit more when I have a bit more time. But feel free to keep the ideas going on how to integrate this. |
@jdepoix Thanks for touching on that. I think I totally made an assumption there based off what I was seeing when trying to follow the logic in the code. After directly inspecting a I guess some assumption that Disregard my previous post, my thoughts were primarily based on that assumption. 😅 |
@jdepoix To adapt the
We would probably want to pass a Transcript object and assuming the Transcript objects remain about the same as they are now, we could maybe have the formatter alter the I will keep thinking about this as well, I will likely do a feature branch and experiment with some ideas. |
Can you add the ability to grab the json3 and not do any post-processing/formatting for now? |
Hi @nikitalita, However, you can call some private methods to get the raw json output which is currently being processed: import requests
from youtube_transcript_api._transcripts import TranscriptListFetcher
video_id = '<video-id>'
fetcher = TranscriptListFetcher(requests.Session())
print(fetcher._extract_captions_json(fetcher._fetch_video_html(video_id), video_id)) Does this help in any way? 😊 |
If you take a look at this issue, it explains it; using timestamps per word for better accuracy in determining where sentence breaks are. shashank2123/Punctuation-Restoration-For-Youtube-Transcript#1
That does help, thank you :) |
For those of you playing at home, here's how you get the json3 url using the above (substitute languageCode where appropriate): import requests
from youtube_transcript_api._transcripts import TranscriptListFetcher
video_id = '<video_id>'
fetcher = TranscriptListFetcher(requests.Session())
json = fetcher._extract_captions_json(fetcher._fetch_video_html(video_id), video_id)
captionTracks = json['captionTracks']
transcript_track_url = ''
for track in captionTracks:
if track['kind'] == 'asr' and track['languageCode'] == 'en':
transcript_track_url = track['baseUrl'] + '&fmt=json3'
print(transcript_track_url) |
@nikitalita actually, now that I come to think about it, this can be done more easily, since you just have add a param to the url of the transcript. transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
transcript = transcript_list.find_generated_transcript(['en'])
print(transcript._url + '&fmt=json3') |
Hi many thanks for the information. I was trying to figure out one thing actually. Consider the video
where the intervals overlap. Looking at the I think this only occurs on segments that are splitted into two to ease overlaying on the video. Any thoughts @nikitalita |
No idea, I've stopped trying to mess with youtube's subtitles |
Before I took an extended break, I was looking at this overlapping time issue as I noticed it too while I was trying to write up the basic WebVTT formatter. From my understanding, the You can't really even use Is your goal to try to have something like this:
become this instead?
Take note of the start + duration which should add up to the start time of the next line. |
Hello, when you watch a video and put the autogenerated subtitles, it seems like the video and the subtitles are perfectly synchronized. It gives the impresion that they have the exactly timestamp precision when a word is going to be spoken. Is it posible to get this precision ?
Thank you
The text was updated successfully, but these errors were encountered: