complete this task "Task Description
Your task is to write a Python program to fetch research papers based on a user-specified query. The
program must identify papers with at least one author affiliated with a pharmaceutical or biotech
company and return the results as a CSV file.
Problem Details
I'll create a Python program that fetches research papers from PubMed and identifies those with pharmaceutical/biotech company affiliations. Let's build this step by step.
First, let's set up the project structure with Poetry:
setup.py
Now, let's create the pyproject.toml file for Poetry:
pyproject.toml
Let's create the core module for fetching papers:
pubmed_paper_fetcher/pubmed_api.py
Now, let's create the paper processor module:
pubmed_paper_fetcher/paper_processor.py
Now, let's create the CLI module:
pubmed_paper_fetcher/cli.py
Let's create the package's __init__.py file:
pubmed_paper_fetcher/__init__.py
Finally, let's create a README.md file:
README.md
pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git# Basic usage
get-papers-list "cancer therapy"
# Save results to a file
get-papers-list "cancer therapy" --file results.csv
# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200
# Enable debug mode
get-papers-list "cancer therapy" --debug
# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com
# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_keyquery: PubMed search query (supports full PubMed query syntax)-h, --help: Display usage instructions-d, --debug: Print debug information during execution-f, --file: Specify the filename to save the results (if not provided, print to console)-m, --max-results: Maximum number of results to fetch (default: 100)-e, --email: Email address to identify yourself to NCBI (required)-k, --api-key: NCBI API key for higher request limitsThe project is organized as follows:
pubmed_paper_fetcher/: Main package directory
__init__.py: Package initializationcli.py: Command-line interfacepubmed_api.py: PubMed API clientpaper_processor.py: Paper processing logicpyproject.toml: Poetry configurationREADME.md: Project documentationpoetry run pytestpoetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcherpoetry run mypy pubmed_paper_fetcherMIT
Let's also create a simple test file:
```py project="PubMed Paper Fetcher" file="tests/test_pubmed_api.py" type="python"
"""
Tests for the PubMed API module.
"""
import unittest
from unittest.mock import patch, MagicMock
from pubmed_paper_fetcher.pubmed_api import PubMedAPI
class TestPubMedAPI(unittest.TestCase):
"""Test cases for the PubMedAPI class."""
def setUp(self):
"""Set up test fixtures."""
self.api = PubMedAPI(email="test@example.com")
def test_is_non_academic_affiliation_company(self):
"""Test identification of company affiliations."""
# Test obvious company affiliations
affiliation = "Pfizer Inc., New York, NY, USA"
is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
self.assertTrue(is_company)
self.assertEqual(company_name, "Pfizer Inc.")
affiliation = "Genentech, Inc., South San Francisco, CA 94080, USA"
is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
self.assertTrue(is_company)
self.assertEqual(company_name, "Genentech, Inc.")
def test_is_non_academic_affiliation_academic(self):
"""Test identification of academic affiliations."""
# Test obvious academic affiliations
affiliation = "Department of Biology, Stanford University, Stanford, CA, USA"
is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
self.assertFalse(is_company)
self.assertIsNone(company_name)
affiliation = "Harvard Medical School, Boston, MA, USA"
is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
self.assertFalse(is_company)
self.assertIsNone(company_name)
@patch('pubmed_paper_fetcher.pubmed_api.Entrez')
def test_search_papers(self, mock_entrez):
"""Test searching for papers."""
# Mock Entrez.esearch and Entrez.read
mock_handle = MagicMock()
mock_entrez.esearch.return_value = mock_handle
mock_entrez.read.return_value = {"IdList": ["12345", "67890"]}
# Call the method
result = self.api.search_papers("cancer therapy", max_results=10)
# Verify the result
self.assertEqual(result, ["12345", "67890"])
# Verify Entrez.esearch was called with the correct parameters
mock_entrez.esearch.assert_called_once_with(
db="pubmed", term="cancer therapy", retmax=10
)
if __name__ == "__main__":
unittest.main()Install the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git
cd pubmed-paper-fetcher
poetry install
poetry shellRun the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csvThe tool will:
The tool uses a combination of techniques to identify non-academic affiliations:
The tool implements robust error handling:
As a bonus, the package is structured as a reusable module with a separate CLI interface. This allows it to be:
To publish to test-pypi, run:
poetry config repositories.test-pypi https://test.pypi.org/legacy/
poetry build
poetry publish -r test-pypiPotential enhancements for future versions:
This tool provides researchers and analysts with an efficient way to identify research papers with pharmaceutical or biotech company affiliations, helping to track industry research activities and potential conflicts of interest.
Here's the complete code for the PubMed Paper Fetcher project:
pyproject.toml
setup.py
pubmed_paper_fetcher/__init__.py
pubmed_paper_fetcher/pubmed_api.py
pubmed_paper_fetcher/paper_processor.py
pubmed_paper_fetcher/cli.py
tests/test_pubmed_api.py
tests/test_paper_processor.py
tests/test_cli.py
README.md
pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git# Basic usage
get-papers-list "cancer therapy"
# Save results to a file
get-papers-list "cancer therapy" --file results.csv
# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200
# Enable debug mode
get-papers-list "cancer therapy" --debug
# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com
# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_keyquery: PubMed search query (supports full PubMed query syntax)-h, --help: Display usage instructions-d, --debug: Print debug information during execution-f, --file: Specify the filename to save the results (if not provided, print to console)-m, --max-results: Maximum number of results to fetch (default: 100)-e, --email: Email address to identify yourself to NCBI (required)-k, --api-key: NCBI API key for higher request limitsThe project is organized as follows:
pubmed_paper_fetcher/: Main package directory
__init__.py: Package initializationcli.py: Command-line interfacepubmed_api.py: PubMed API clientpaper_processor.py: Paper processing logicpyproject.toml: Poetry configurationREADME.md: Project documentationpoetry run pytestpoetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcherpoetry run mypy pubmed_paper_fetcherMIT
## 5. Publishing to test-pypi
To publish the package to test-pypi, you would run the following commands:
```bash
# Configure Poetry to use test-pypi
poetry config repositories.test-pypi https://test.pypi.org/legacy/
# Build the package
poetry build
# Publish to test-pypi
poetry publish -r test-pypiInstall the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git
cd pubmed-paper-fetcher
poetry install
poetry shellRun the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csvThe tool will:
This completes the full implementation of the PubMed Paper Fetcher project according to the requirements.
A command-line tool to fetch research papers from PubMed based on a user-specified query, identifying papers with at least one author affiliated with a pharmaceutical or biotech company.
# Clone the repository
git clone https://github.com/yourusername/pubmed-paper-fetcher.git
cd pubmed-paper-fetcher
# Install dependencies using Poetry
poetry install
# Activate the virtual environment
poetry shell