Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing commands on scrapinghub cloud #430

Open
hamzza-K opened this issue Jul 25, 2023 · 2 comments
Open

Executing commands on scrapinghub cloud #430

hamzza-K opened this issue Jul 25, 2023 · 2 comments

Comments

@hamzza-K
Copy link

Hi,
I'm trying to deploy my spider that uses playwright (scrapy-playwright for integration). I have the following configuration:
scrapinghub.yml

requirements:
  file: requirements.txt
cmd:
- export PATH=/app/python/bin:$PATH
- playwright install
- playwright install-deps

I can see that modules get successfully installed in the deploy logs but how can I execute the following commands since playwright needs it after fresh installation. I couldn't find anything related to this in all of the zyte documentation.

@elacuesta
Copy link
Member

You need to deploy a custom Docker image in order to do have arbitrary commands executed.

Regarding scrapy-playwright, I have a sample project that demonstrates how to use it on Scrapy Cloud. Disclaimer: this is a personal project, it is NOT an officially supported Scrapy stack.

@hamzza-K
Copy link
Author

hamzza-K commented Jul 27, 2023

I tried to hack a workaround by messing with the scripts argument using setup.py

script.py


import subprocess
def run_bash_command(command):
    try:
        result = subprocess.run(command, shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
        return result.stdout
    except subprocess.CalledProcessError as e:
        print(f"Error running command: {e}")
        return None

commands = ["export PATH=/app/python/bin:$PATH", "playwright install", "playwright install-deps"]

for command in commands:
    output = run_bash_command(command)
    if output is not None:
        print("Command output:")
        print(output)``` 


I'm not sure why this isn't setting the path right to correct the following error in the logs.
  `WARNING: The scripts pip, pip3 and pip3.8 are installed in '/app/python/bin' which is not on PATH.`
0: 2023-07-26 12:39:49 INFO Log opened.
1: 2023-07-26 12:39:50 INFO [stdout] Command output:
2: 2023-07-26 12:39:50 INFO [stdout]
3: 2023-07-26 12:39:50 INFO [stdout] Error running command: Command 'playwright install' returned non-zero exit status 127.
4: 2023-07-26 12:39:50 INFO [stdout] Error running command: Command 'playwright install-deps' returned non-zero exit status 127.

This works fine on my local environment. Can you explain why this won't work here?

Thanks for sharing the Dockerfile setup, I'm afraid that's the only way for me to get what I want.  I really wish there was a feature of opening a bash shell just like in Heroku. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants