You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I've discovered a bug in listing all available subtitle tracks for videos with more manually created transcripts with same language code. I'm using latest Python on latest Windows and PyCharm, but it does not care, with online services like repl.it is exactly the same.
Only as example, you can test with the following video Trump campaign sets sights on another deep-blue state and in general with all videos published on the Fox News channel, but the same for quite a few channels of broadcast networks, as they have all the following caption tracks:
You can run the below code:
from youtube_transcript_api import YouTubeTranscriptApi
subs = YouTubeTranscriptApi.list_transcripts('STjvfE4HVXY')
for sub in subs:
print(f'code:<{sub.language_code}> auto:<{sub.is_generated}> lang:<{sub.language}>')
to obtain the next result:
This is the PyCharm debug view:
As you can see the track CC1 is missing from the list of tracks available for the video which I believe is not present as among the manually created tracks both track CC1 and track DTVCC1 have same language code 'en' which is used as the only key for the dictionary separated only for autogenerated and manually generated tracks. So given track CC1 - as shown below in the JSON extracted from the HTML video page - is listed first than track DTVCC1, the code saves CC1 in the dictionary with key 'en' among the manually generated tracks and when it later finds DTVCC1 with same code 'en' again in manually generated it overwrites CC1 which then no longer appears.
This bug I believe can be solved by using 'trackName' as an addition to the dictionary tracks key so that keys such as 'en CC1' and 'en-DTVCC1' would no longer cause loss of tracks with same language code and both manually generated.
Thank you and let me know please.
The text was updated successfully, but these errors were encountered:
Hi @Angel756984,
sorry for the late reply. This bug has already been reported and discussed in #150. While I agree that including the trackname in the key could provide a potential solution, I would consider this a breaking change, as track names would have to be included in the language code from that point on, which could potentially break existing code.
Hi, I've discovered a bug in listing all available subtitle tracks for videos with more manually created transcripts with same language code. I'm using latest Python on latest Windows and PyCharm, but it does not care, with online services like repl.it is exactly the same.
Only as example, you can test with the following video Trump campaign sets sights on another deep-blue state and in general with all videos published on the Fox News channel, but the same for quite a few channels of broadcast networks, as they have all the following caption tracks:
You can run the below code:
to obtain the next result:
This is the PyCharm debug view:
As you can see the track CC1 is missing from the list of tracks available for the video which I believe is not present as among the manually created tracks both track CC1 and track DTVCC1 have same language code 'en' which is used as the only key for the dictionary separated only for autogenerated and manually generated tracks. So given track CC1 - as shown below in the JSON extracted from the HTML video page - is listed first than track DTVCC1, the code saves CC1 in the dictionary with key 'en' among the manually generated tracks and when it later finds DTVCC1 with same code 'en' again in manually generated it overwrites CC1 which then no longer appears.
This bug I believe can be solved by using 'trackName' as an addition to the dictionary tracks key so that keys such as 'en CC1' and 'en-DTVCC1' would no longer cause loss of tracks with same language code and both manually generated.
Thank you and let me know please.
The text was updated successfully, but these errors were encountered: