How can I get the GitHub repository's url from inside a running scraper?

I tried using git config remote.origin.url, but it seems that the scraper isn’t run in a git repo.

Would it be possible for morph to set an environment variable or write out a file in the scraper’s container with the repo url in?

For now I have worked around this by manually setting an environment variable on the individual scraper. But it would be awesome if morph could automatically provide it somehow :slight_smile:

2 Likes

Bumping this as a feature request. Is it possible to get the identity of the scraper from within the scraper, either the name of the repository or the morph/github URL? Either that, or the ability to identify the scraper in the POST webhook? We’d like to be able to associate the data from a scrape to the scraper that generated it so that we can automatically ingest the data via the API.

Thanks for these ideas @jeffreyliu and @pudo :slight_smile:

For now I have worked around this by manually setting an environment variable on the individual scraper. But it would be awesome if morph could automatically provide it somehow

I’d go with that for your immediate needs @jeffreyliu . Setting this up as an environment variable for everyone wouldn’t be a big change. Might I suggest opening an issue at Issues · openaustralia/morph · GitHub and describing what you’ll be able to achieve with this feature/why you need it?

Either that, or the ability to identify the scraper in the POST webhook?

There’s already an issue for this Include scraper name in data passed in the webhook post request · Issue #1043 · openaustralia/morph · GitHub

It would be great to add your use-case on there, and also if you’re up for it, a Pull Request to add the feature :smiley: