@febarabash

How to use Scrapy with AWS Lambda?

There are 2 applications: on flask, and on scrapy. Each of them is flooded into a separate lambda through zappa. The application faces have 3 endpoints, each of which is through SQS tiggerit scrap lambda. The trigger itself works fine, but there are 3 questions:

1) Is it possible to somehow remove the limit on the performance of lambda on scrapie? (I found an opportunity to increase the limit to only 15 minutes, during this time scrapy does not have time to collect all items)

2) Is it possible to flush through this sqm lambda without API Gateway through SQS, and whether it is possible to flood the application through zappa so that the api gateway is not created. Or do I need to fill in scrapes manually?

3) If you cannot trigger lambdas without API Gateway, then how can I return the correct response?

Now I have the following function:
def lambda_event(event, context):
  try:
    data = json.loads(event['body'])
    scrapy_settings = get_project_settings()
    scrapy_settings['ITEM_PIPELINES'] = {
      'sunbiz_spiders.pipelines.DynamodbPipeline': 300,
    }
    scrapy_settings['DOWNLOAD_DELAY'] = 0.5
    process = CrawlerProcess(settings=scrapy_settings)
    if data['spider_name'] == 'SearchByPersonSpider':
      spider = SearchByPersonSpider
    elif data['spider_name'] == 'GetDetailSpider':
      spider = GetDetailSpider
    else:
      spider = SearchByNameSpider
    process.crawl(spider, search_params=data['spider_name'])
    process.start()
  except Exception:
    pass

  return {
    'statusCode': 200,
    'body': json.dumps('All done.'),
  }


Config zappa:

{
    "production": {
        "app_function": "main.lambda_event",
        "aws_region": "us-east-1",
        "profile_name": "default",
        "project_name": "sunbiz-search-s",
        "runtime": "python3.6",
        "s3_bucket": "zappa-envjkpiz6"
    }
}


And when prompted I get list index out range werkzeug / test.py line 1146
  • Вопрос задан
  • 335 просмотров
Пригласить эксперта
Ответы на вопрос 1
inoise
@inoise Куратор тега Amazon Web Services
Solution Architect, AWS Certified, Serverless
1. No, u can't increase time limit. Maybe u ok with increasing lambda memory? Or can u separate your lambda via AWS StepFunctions?
2. U do not need use zappa at all and u really can use sqs and lambda without api gateway
3. I have not got enough python expertise. Sorry
Ответ написан
Комментировать
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Войти через центр авторизации
Похожие вопросы