Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting transcript in the original language #252

Closed
mgoldenbe opened this issue Jan 28, 2024 · 3 comments
Closed

Getting transcript in the original language #252

mgoldenbe opened this issue Jan 28, 2024 · 3 comments

Comments

@mgoldenbe
Copy link

What is the most straightforward way to get transcript in the language spoken in the video? Is it to first get the list of auto-generated transcripts? If so:

  • Can I be certain that the list of auto-generated transcripts will contain exactly one element?
  • What can I do when the author removed auto-generated transcript, such as for this video: https://youtu.be/rfscVS0vtbw
@GorujoCY
Copy link

GorujoCY commented Feb 4, 2024

This kind of post belongs to Discussions or StackOverflow than an issue...
this is a question not an issue.

@mgoldenbe
Copy link
Author

mgoldenbe commented Feb 5, 2024

@GorujoCY I only see the button New Issue. How do I open a discussion?

@jdepoix
Copy link
Owner

jdepoix commented Feb 28, 2024

Hi @mgoldenbe,
There's three questions here I think:

What is the most straightforward way to get transcript in the language spoken in the video?

There is none unfortunately. Currently english is always used as the default language. There is an issue open for changing the default behaviour to return the default transcript of the video (#133), but it hasn't been implemented yet. However, even this wouldn't guarantee that the transcript you get is the language spoken in the video.

Can I be certain that the list of auto-generated transcripts will contain exactly one element?

To be honest, I don't know. This module just pulls information from YouTube and it's hard to give guarantees about anything YouTube is doing. My guess would be that there's some cases where there could be multiple, but there's only one in most. So in those cases where there's only one, you could use that as a hint towards which language is the language spoken in the video. But my experience in working on this module has been that there's basically an exception for everything, so there will most certainly be some weird cases,where this logic doesn't work out. Feel free to share your findings if you play around with this!

What can I do when the author removed auto-generated transcript, such as for this video

Nothing you can do here I think.

I will close this now, as there's not really a fixable issue here, but feel free to update here if you play around with inferring the language from the auto-generated transcripts, or create a discussion as others have suggested (go to https://github.com/jdepoix/youtube-transcript-api/discussions and press the "New Discussion" button in the top right).

@jdepoix jdepoix closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants