Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TranscriptsDisabled But it's not disabled (works locally, fails on Cloud machine) #303

Open
atoonk opened this issue Jul 17, 2024 · 172 comments

Comments

@atoonk
Copy link

atoonk commented Jul 17, 2024

To Reproduce

using youtube-transcript-api-0.6.2:

cat test.py 
from youtube_transcript_api import YouTubeTranscriptApi

print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

outputs:

python3 ./test.py 
Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

What code / cli command are you executing?

I am running

from youtube_transcript_api import YouTubeTranscriptApi
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

Which Python version are you using?

Python 3.11.6

Which version of youtube-transcript-api are you using?

youtube-transcript-api-0.6.2

Expected behavior

Describe what you expected to happen.
I expected to receive the english transcript
I can see it in browser, see screenshot:
Screenshot 2024-07-17 at 2 56 23 PM

Actual behaviour

Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
@Ibrahim-Faisal15
Copy link

Ibrahim-Faisal15 commented Jul 17, 2024

Yes the issue is valid, but it seems that this does not show with the link, which Youtube gave us when we use the link from Share button.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Hi @atoonk, do you only have this issue with this specific video, or all videos you are trying to retrieve? I can retrieve the subtitles for that video without any issues, which usually means that you are being rate-limited by YouTube (which would also mean that this should happen for all videos).

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Hi @jdepoix, I encountered the same problem yesterday with every video I tried. Although I don't use the API frequently, I do access it a few times per day. I hope it's not some new restriction from YouTube. I experienced the same problem as @atoonk, and the issue is still present today.

Thanks a lot for your quick response and for this amazing tool; I really like it.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Hi @SKVNDR, then you're most definitely being blocked by YouTube. The only way to work around this is to change your IP address in any way (VPN, proxy, or assign a new IP if possible).

@fleerdayo
Copy link

I can confirm that YouTube is most likely blocking =/
It works from my local dev env but it doesn't work in production all things equal.

@alimbekovKZ
Copy link

I have the same problem. But I never use this library before, just firs try for along time

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

If you're running your code on a cloud machine it could be that (depending on your setup) you're getting assigned an IP from a pool that is shared with other machines. So the IP you're using could potentially be blocked without you doing anything. YouTube could also generally black list certain IPs that are known to belong to cloud providers (just a guess, I don't know if they actually do that!).

@atoonk
Copy link
Author

atoonk commented Jul 18, 2024

Ah yes, i tried it from my laptop at home and it works fine now. And indeed, it affected all videos, which I why I thought it was a bug or new behaviour in YT api.
So, I guess YouTube blocked me (this was on Digital ocean machine). Bummer, gotta find a way around that. Any docs on the ratelimit numbers or when folks get added? I only run this once every few weeks and only for a dozen videos or so. So bit surprised I was blocked. Unless it's all of digital ocean.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

Since this is not an official API, there unfortunately is no information on rate limits and when or for how long you will get blocked. People have been reporting different things, so I don't feel like it is consistent either.

@jdepoix
Copy link
Owner

jdepoix commented Jul 18, 2024

I will pin this issue and leave it open, since there are issues being opened due to this all the time.
Feel free to discuss workarounds and share your experience on YouTubes blocking heuristics, but be aware that there is no proper fix here and probably never will be. That's the nature of using an unofficial API unfortunately.

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...

@auspy
Copy link

auspy commented Jul 18, 2024

Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...

how did you create a proxy can you share the code. did you use a free proxy or paid? how did you obtain that proxy?

@SKVNDR
Copy link

SKVNDR commented Jul 18, 2024

Hi @auspy,

from youtube_transcript_api import YouTubeTranscriptApi  
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})

I'm using a paid proxy from smartproxy.com with the "Residential" offer.
There are probably other better proxies available; I chose this one randomly.

@atoonk
Copy link
Author

atoonk commented Jul 18, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy
which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

@yourdesigncoza
Copy link

Having the exact same issue, & also using DigitalOcean droplet

@ZhimaoLin
Copy link

Same here. Subscribed this issue.

@auspy
Copy link

auspy commented Jul 19, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

sure would love a video on it. drop the link here

@auspy
Copy link

auspy commented Jul 19, 2024

Hi @auspy,

from youtube_transcript_api import YouTubeTranscriptApi  
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})

I'm using a paid proxy from smartproxy.com with the "Residential" offer. There are probably other better proxies available; I chose this one randomly.

thank you for sharing. this surely looks like a cheap option but I was looking for something free. don't want to pay in initial stages of my project.

@yourdesigncoza
Copy link

yourdesigncoza commented Jul 19, 2024

confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})

can make a more details quick video if folks are interested in how to use that.

sure would love a video on it. drop the link here

@auspy Would Appreciate a vid. or just more info. ::: I'm all new to proxies etc. seems most info. online is kinda for the more experienced :::

@williamtkelley
Copy link

Just ran across this issue today, glad I found this thread. I too am on Digital Ocean, running my code in a Docker container. Getting transcripts runs fine locally, but not on DO.

I would appreciate the video mentioned above, as proxies are new to me. If I use my localhost as a proxy, it means I need to leave the machine running 24/7 right? I mean, I guess that's obvious.

@tuganbaev
Copy link

Yep, same with me -- looks like youtube blocked many DO servers at once -- i didn't spent so much requests and I'm banned.

@alimbekovKZ
Copy link

I also use Digital Ocean droplet, i think they block IPs from DO servers. now I using google cloud functions.

@BenjaminKobjolke
Copy link

I can confirm that it is a problem with digital ocean servers being blocked.
Using a proxy is the solutiion.

@alimbekovKZ
Copy link

Now this error also in google cloud functions.

@ethan-0l
Copy link

ethan-0l commented Aug 5, 2024

Blocked from dedicated OVH too

@atikinkoon
Copy link

Has anyone faced same issue on pythonanywhere?

@jackye315
Copy link

I havent tested the limits but using it sparingly to get a few video transcripts here and there (~20-30 a day) has been completely fine. Unless you are trying to scrape massive amounts of youtube videos, using the warp proxy has been fine for me.

@dominicdev
Copy link

I have this problem in GCF, is there any fix in using GCF? was working in local machine but noit in GCF

@rdodev
Copy link

rdodev commented Dec 9, 2024

@dominicdev there's no code fix to do here. It's all because YT api is authenticated and has rate limits and bans on anonymous requests from IP blocks belonging to cloud providers. What's worse is that they disabled API keys, so now has to be done via OAuth2 to authorize and refresh tokens. So basically works from home and maybe office, but likely won't work from any cloud service unless you use something like Apify or some proxy service that hasn't been IP-banned yet.

@flipbytes-dk
Copy link

I deployed on Render. Same problem here. Proxying through smart proxy fixed it. Although smart proxy residential plan starts at $7 per GB.

Can you please tell me how you did this on Render?

@dominicdev
Copy link

the Proxying through smart proxy fixed it , works for me

@FVTAL
Copy link

FVTAL commented Dec 20, 2024

It's definitely not a blocking by YouTube issue, I have figured this after hours (i think? haha). It's a cookie's issue to start. If you're hosting on a cloud, you need to copy your Netscape formatted cookies.txt from local machine to the cloud and make sure all headers are correct to avoid cors issues. Make sure formatting for request is right otherwise it'll be truncated and cut off for videoid / always return the subtitles error. I got it working fine on digital ocean and receiving transcripts from any video. It All ultimately came down to cookies and formatting.

@jdepoix
Copy link
Owner

jdepoix commented Dec 20, 2024

Hi @FVTAL, that is not quite accurate. When you receive this TranscritpDisabled error, it is because YouTube returns something like "Sign in to confirm you're not a bot". By using a cookie you're basically signing in, which allows you to continue doing requests. However, they will ban your YouTube account eventually if you keep doing a lot of requests, as reported by @hatemmezlini in this thread:

You are welcome @jdepoix . They banned the account, I couldn't view youtube videos even from the browser

So yes, you can use cookies to temporarily work around this issue, but your account will be banned eventually. Therefore, I wouldn't really consider this a proper solution, while using proxies is much more scalable!

@FVTAL
Copy link

FVTAL commented Dec 20, 2024

Hi @FVTAL, that is not quite accurate. When you receive this TranscritpDisabled error, it is because YouTube returns something like "Sign in to confirm you're not a bot". By using a cookie you're basically signing in, which allows you to continue doing requests. However, they will ban your YouTube account eventually if you keep doing a lot of requests, as reported by @hatemmezlini in this thread:

You are welcome @jdepoix . They banned the account, I couldn't view youtube videos even from the browser

So yes, you can use cookies to temporarily work around this issue, but your account will be banned eventually. Therefore, I wouldn't really consider this a proper solution, while using proxies is much more scalable!

Understood - and thank you for clarifying. I appreciate your efforts and thank you for the project.

@modernwitchcraft
Copy link

Hi @auspy,

from youtube_transcript_api import YouTubeTranscriptApi  
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})

I'm using a paid proxy from smartproxy.com with the "Residential" offer. There are probably other better proxies available; I chose this one randomly.

I can confirm that this works. I also got a $7 package at Smart Proxy, and it worked just fine.

@AnasKhan0607
Copy link

Came across the same issue, while running on aws ec2. A bummer for sure. Wondering if anyone knows if this workaround good for production applications, where users will be making 1000s of requests? Does anyone use this for production applications?

@chrismaresca
Copy link

I got it working using https://smartproxy.com/. I'm planning on having a relatively large scale so going to incorporate a proxy pool down the line.

proxy_enabled = os.getenv("USE_PROXY", "false").lower() == "true"
logger.debug(f"Proxy enabled: {proxy_enabled}")

if proxy_enabled:
    transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies=get_proxy_dict())
else:
    transcript = YouTubeTranscriptApi.get_transcript(video_id)

@jordanmruczynski
Copy link

I've been trying with proxymesh and it seems it doesn't works.
Could you guys using smartproxy tell me about your usage per request? 1gb/7$ will be enough up to 1k/month?
Regards

@dss99911
Copy link

dss99911 commented Dec 25, 2024

I used the Tor proxy, and it seems to work well on an EC2 server so far. It’s free, requires no signup, and doesn’t need complex configuration. Just install it, and you’re ready to use it.

The Tor proxy allows for continuous changes of the exit node if needed and operates with a network of approximately 8,000 servers.

You can check the guide doc here
https://dss99911.github.io/miscellanea/2025/01/08/youtube-transcript-eng.html

@FVTAL
Copy link

FVTAL commented Dec 25, 2024 via email

@Amiraz64
Copy link

@hatemmezlini I’ve been using your service on apify for some time now and I really appreciate it, thanks a lot for it!
When YouTube blocked cloud provider IPs, it was really frustrating, but your service has been a lifesaver. That said, I’ve noticed it takes around 10 seconds to get a transcript on apify, which can be a bit slow since our users often send thousands of requests daily. Do you know of any other services that might be able to handle this faster?

@hatemmezlini
Copy link

Hi @Amiraz64 ! Thank you for your kind words. However promoting a product is outside the scope of this disucussion. If you have any questions please reach out to my email: [email protected]
But to answer your question, inshort, yes, we developed a large scale API just for that matter:
https://rapidapi.com/invideoiq-invideoiq-default/api/video-transcript-scraper

@orbimatrix
Copy link

faced the same issue today in aws ec2

changing aws instances location and proxies didn't work for me.

@FVTAL
Copy link

FVTAL commented Jan 10, 2025

I think YouTube did indeed change some API stuff, I had an issue too, but it ended up being my WARP container, and had to reconfigure that as WARP had an update too. So unsure what it was, but I did get it working again.

@orbimatrix
Copy link

@FVTAL can you tell me how to use WARP Container to get rid of this error because I'm struggling with proxies also and using LLM especially rom groq and if I use groq here then cloudflare gives error you can't access the groq api and groq api was working initially on aws ec2 before setting the proxy.

@FVTAL
Copy link

FVTAL commented Jan 10, 2025

@FVTAL can you tell me how to use WARP Container to get rid of this error because I'm struggling with proxies also and using LLM especially rom groq and if I use groq here then cloudflare gives error you can't access the groq api and groq api was working initially on aws ec2 before setting the proxy.

To get it working I had to spin up an additional DigitalOcean droplet alongside my existing one. On new droplet, I installed WARP and configured it to be publicly accessible by binding it to 0.0.0.0 and the port of my choice. I then set up a mechanism to route all transcript-api requests through WARP-configured droplet. The responses from the API come in JSON format, so be sure they are parsed accordingly to integrate them into whatever you're using it for.

Here's a curl example:

curl -X POST -L https://url-for-api-transcript-or-ip
-H "Content-Type: application/json"
-d '{"video_id":"dQw4w9WgXcQ"}'

@FVTAL
Copy link

FVTAL commented Jan 11, 2025

Right - because it sees it as a server IP not a residential IP, thus using WARP would fix this

@orbimatrix
Copy link

@FVTAL Can you provide me docs for how to use wrap? I can't able to understand what you said in above and i'm using hostinger as Domain and AWS EC2 or Azure Machines only. Not DigitalOcean Droplet things.

@tff2011
Copy link

tff2011 commented Jan 15, 2025

Is possible to use Cloudflare Warp on Coolify? VPS Hetzner.

@FVTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests