Speaker
Description
The philosophy of the R package paperboy is that the package is a repository for webscraping scripts for news media sites, with advanced features for quick data retrieval - even for content behind log-ins or anti-scraping measures. Many data scientists and researchers write their own code when they have to retrieve news media content from websites. At the end of research projects, this code is often collecting digital dust on researchers hard drives instead of being made public for others to employ. Paperboy offers writers of webscraping scripts a clear path to publish their code and earn co-authorship on the package, while promising users to deliver news media data from many websites in a consistent format. With 179 covered as of today and a default scraper that often works well enough, paperboy can already facilitate a large range of research projects.
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
No AI tools/services were used.
Additional Material or Paper
A preprint can be found here: https://osf.io/preprints/socarxiv/hu6qw_v1.
A tool demo was given at ICA 2025: https://github.com/JBGruber/ica25_tool-demos.
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | data mining, open data, news media data, webscraping |
|---|---|
| Virtual Option | This submission is for onsite presentation only |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |
| Interested in serving as reviewer? | sina.chen@gesis.org |