complete this task "Task Description
Your task is to write a Python program to fetch research papers based on a user-specified query. The
program must identify papers with at least one author affiliated with a pharmaceutical or biotech
company and return the results as a CSV file.
Problem Details

Source of Papers
○ Fetch papers using the PubMed API
○ The program should support PubMed's full query syntax for flexibility.
Output Requirements
○ Return the results as a CSV file with the following columns:
■ PubmedID: Unique identifier for the paper.
■ Title: Title of the paper.
■ Publication Date: Date the paper was published.
■ Non-academic Author(s): Names of authors affiliated with non-academic
institutions.
■ Company Affiliation(s): Names of pharmaceutical/biotech companies.
■ Corresponding Author Email: Email address of the corresponding author.
Command-line Program Features
○ Accept the query as a command-line argument.
○ Provide the following options:
■ -h or --help: Display usage instructions.
■ -d or --debug: Print debug information during execution.
■ -f or --file: Specify the filename to save the results. If this option is not
provided, print the output to the console.
Code Organization and Environment
○ Version Control:
■ Use Git for version control. The code must be hosted on GitHub.
○ Dependencies and Setup:
■ Use Poetry for dependency management and packaging.
■ Ensure that running poetry install sets up all dependencies.
○ Execution:
■ Provide an executable command named get-papers-list via Poetry.
Documentation
○ Include a README.md file with the following details:
■ How the code is organized.
■ Instructions on how to install dependencies and execute the program.
■ Mention any tools (e.g., LLMs or libraries) used to build the program, along with
relevant links.
Evaluation Criteria
○ Functional Requirements:
■ Adherence to the problem statement.
■ Ability to fetch and filter results correctly.
○ Non-functional Requirements:
■ Typed python: Using types everywhere.
■ Performance: Efficiency of API calls and processing.
■ Readability: Clear and maintainable code with appropriate comments and
docstrings.
■ Organization: Logical separation of concerns (e.g., modular functions and
classes).
■ Robustness: Error handling for invalid queries, API failures, or missing data.
Bonus points
Each of these additional points
Break the program into two parts: a module and a command line program that uses the module.
Publish the module in test-pypi.
Notes
● You are free to use LLM tools or other resources to assist in development – please s
● Clearly document any external tools used in the README.md.
● Assume the program will be evaluated by automated scripts, so strict adherence to conventions
is required.
● How to identify non-academic authors? You can apply any heuristics (email addresses, words like
university, labs etc)."

Show full message

456d ago

I'll create a Python program that fetches research papers from PubMed and identifies those with pharmaceutical/biotech company affiliations. Let's build this step by step.

Project Structure

First, let's set up the project structure with Poetry:

PubMed Paper Fetcher

setup.py

Now, let's create the pyproject.toml file for Poetry:

PubMed Paper Fetcher

pyproject.toml

Let's create the core module for fetching papers:

PubMed Paper Fetcher

pubmed_paper_fetcher/pubmed_api.py

Now, let's create the paper processor module:

PubMed Paper Fetcher

pubmed_paper_fetcher/paper_processor.py

Now, let's create the CLI module:

PubMed Paper Fetcher

pubmed_paper_fetcher/cli.py

Let's create the package's __init__.py file:

PubMed Paper Fetcher

pubmed_paper_fetcher/__init__.py

Finally, let's create a README.md file:

PubMed Paper Fetcher

README.md

Using pip

pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git

Usage

# Basic usage
get-papers-list "cancer therapy"

# Save results to a file
get-papers-list "cancer therapy" --file results.csv

# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200

# Enable debug mode
get-papers-list "cancer therapy" --debug

# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com

# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_key

Command-line Options

query: PubMed search query (supports full PubMed query syntax)
-h, --help: Display usage instructions
-d, --debug: Print debug information during execution
-f, --file: Specify the filename to save the results (if not provided, print to console)
-m, --max-results: Maximum number of results to fetch (default: 100)
-e, --email: Email address to identify yourself to NCBI (required)
-k, --api-key: NCBI API key for higher request limits

Code Organization

The project is organized as follows:

pubmed_paper_fetcher/: Main package directory
- __init__.py: Package initialization
- cli.py: Command-line interface
- pubmed_api.py: PubMed API client
- paper_processor.py: Paper processing logic
pyproject.toml: Poetry configuration
README.md: Project documentation

How It Works

The tool queries PubMed using the Entrez API from Biopython
For each paper, it extracts author information and affiliations
It identifies non-academic affiliations using heuristics (keywords, patterns)
Papers with at least one author affiliated with a pharmaceutical/biotech company are included in the results
Results are formatted as a CSV file with the required columns

Tools and Libraries Used

Poetry: Dependency management and packaging
Biopython: Interface to NCBI Entrez API
Requests: HTTP library for API calls
tqdm: Progress bar for long-running operations
Python typing: Type hints throughout the code

Development

Running Tests

poetry run pytest

Code Formatting

poetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcher

Type Checking

poetry run mypy pubmed_paper_fetcher

License

MIT


Let's also create a simple test file:

```py project="PubMed Paper Fetcher" file="tests/test_pubmed_api.py" type="python"
"""
Tests for the PubMed API module.
"""

import unittest
from unittest.mock import patch, MagicMock

from pubmed_paper_fetcher.pubmed_api import PubMedAPI

class TestPubMedAPI(unittest.TestCase):
    """Test cases for the PubMedAPI class."""
    
    def setUp(self):
        """Set up test fixtures."""
        self.api = PubMedAPI(email="test@example.com")
    
    def test_is_non_academic_affiliation_company(self):
        """Test identification of company affiliations."""
        # Test obvious company affiliations
        affiliation = "Pfizer Inc., New York, NY, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertTrue(is_company)
        self.assertEqual(company_name, "Pfizer Inc.")
        
        affiliation = "Genentech, Inc., South San Francisco, CA 94080, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertTrue(is_company)
        self.assertEqual(company_name, "Genentech, Inc.")
    
    def test_is_non_academic_affiliation_academic(self):
        """Test identification of academic affiliations."""
        # Test obvious academic affiliations
        affiliation = "Department of Biology, Stanford University, Stanford, CA, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertFalse(is_company)
        self.assertIsNone(company_name)
        
        affiliation = "Harvard Medical School, Boston, MA, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertFalse(is_company)
        self.assertIsNone(company_name)
    
    @patch('pubmed_paper_fetcher.pubmed_api.Entrez')
    def test_search_papers(self, mock_entrez):
        """Test searching for papers."""
        # Mock Entrez.esearch and Entrez.read
        mock_handle = MagicMock()
        mock_entrez.esearch.return_value = mock_handle
        mock_entrez.read.return_value = {"IdList": ["12345", "67890"]}
        
        # Call the method
        result = self.api.search_papers("cancer therapy", max_results=10)
        
        # Verify the result
        self.assertEqual(result, ["12345", "67890"])
        
        # Verify Entrez.esearch was called with the correct parameters
        mock_entrez.esearch.assert_called_once_with(
            db="pubmed", term="cancer therapy", retmax=10
        )

if __name__ == "__main__":
    unittest.main()

How to Use the Tool

Install the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git cd pubmed-paper-fetcher poetry install poetry shell
Run the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csv
The tool will:
- Search PubMed for papers matching your query
- Identify papers with authors affiliated with pharmaceutical/biotech companies
- Save the results to a CSV file with the required columns

Key Features

Comprehensive PubMed Search: Supports full PubMed query syntax
Smart Affiliation Detection: Uses heuristics to identify pharmaceutical and biotech company affiliations
Detailed Output: Provides comprehensive information including PubmedID, title, publication date, non-academic authors, company affiliations, and corresponding author email
Configurable: Command-line options for customizing search parameters and output
Rate Limiting: Implements proper rate limiting to comply with NCBI's API usage policies

Implementation Details

Identifying Non-Academic Affiliations

The tool uses a combination of techniques to identify non-academic affiliations:

Keyword Analysis: Looks for company-specific keywords (Inc, Ltd, GmbH, Pharma, Biotech, etc.)
Negative Matching: Excludes affiliations containing academic keywords (University, Hospital, Institute, etc.)
Pattern Recognition: Analyzes affiliation structure to identify company names

Error Handling

The tool implements robust error handling:

API request failures
Rate limiting and retries
Invalid or missing data
User interruptions

Performance Considerations

Batch processing of PubMed IDs
Progress tracking for long-running operations
Efficient memory usage for large result sets

Bonus Implementation

As a bonus, the package is structured as a reusable module with a separate CLI interface. This allows it to be:

Imported and used programmatically in other Python applications
Run directly from the command line
Extended with additional functionality

To publish to test-pypi, run:

poetry config repositories.test-pypi https://test.pypi.org/legacy/
poetry build
poetry publish -r test-pypi

Future Improvements

Potential enhancements for future versions:

Improved company name extraction using NLP techniques
Support for additional metadata fields
Integration with other bibliographic databases
Enhanced filtering options
Web interface or API endpoint

This tool provides researchers and analysts with an efficient way to identify research papers with pharmaceutical or biotech company affiliations, helping to track industry research activities and potential conflicts of interest.

456d ago

give me full code

456d ago

Here's the complete code for the PubMed Paper Fetcher project:

1. Project Setup Files

pyproject.toml

PubMed Paper Fetcher

pyproject.toml

setup.py

PubMed Paper Fetcher

setup.py

2. Package Files

init.py

PubMed Paper Fetcher

pubmed_paper_fetcher/__init__.py

pubmed_api.py

PubMed Paper Fetcher

pubmed_paper_fetcher/pubmed_api.py

paper_processor.py

PubMed Paper Fetcher

pubmed_paper_fetcher/paper_processor.py

cli.py

PubMed Paper Fetcher

pubmed_paper_fetcher/cli.py

3. Test Files

test_pubmed_api.py

PubMed Paper Fetcher

tests/test_pubmed_api.py

test_paper_processor.py

PubMed Paper Fetcher

tests/test_paper_processor.py

test_cli.py

PubMed Paper Fetcher

tests/test_cli.py

4. Documentation

README.md

PubMed Paper Fetcher

README.md

Using pip

pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git

Usage

# Basic usage
get-papers-list "cancer therapy"

# Save results to a file
get-papers-list "cancer therapy" --file results.csv

# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200

# Enable debug mode
get-papers-list "cancer therapy" --debug

# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com

# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_key

Command-line Options

query: PubMed search query (supports full PubMed query syntax)
-h, --help: Display usage instructions
-d, --debug: Print debug information during execution
-f, --file: Specify the filename to save the results (if not provided, print to console)
-m, --max-results: Maximum number of results to fetch (default: 100)
-e, --email: Email address to identify yourself to NCBI (required)
-k, --api-key: NCBI API key for higher request limits

Code Organization

The project is organized as follows:

pubmed_paper_fetcher/: Main package directory
- __init__.py: Package initialization
- cli.py: Command-line interface
- pubmed_api.py: PubMed API client
- paper_processor.py: Paper processing logic
pyproject.toml: Poetry configuration
README.md: Project documentation

How It Works

The tool queries PubMed using the Entrez API from Biopython
For each paper, it extracts author information and affiliations
It identifies non-academic affiliations using heuristics (keywords, patterns)
Papers with at least one author affiliated with a pharmaceutical/biotech company are included in the results
Results are formatted as a CSV file with the required columns

Tools and Libraries Used

Poetry: Dependency management and packaging
Biopython: Interface to NCBI Entrez API
Requests: HTTP library for API calls
tqdm: Progress bar for long-running operations
Python typing: Type hints throughout the code

Development

Running Tests

poetry run pytest

Code Formatting

poetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcher

Type Checking

poetry run mypy pubmed_paper_fetcher

License

MIT


## 5. Publishing to test-pypi

To publish the package to test-pypi, you would run the following commands:

```bash
# Configure Poetry to use test-pypi
poetry config repositories.test-pypi https://test.pypi.org/legacy/

# Build the package
poetry build

# Publish to test-pypi
poetry publish -r test-pypi

How to Use the Tool

Install the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git cd pubmed-paper-fetcher poetry install poetry shell
Run the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csv
The tool will:
- Search PubMed for papers matching your query
- Identify papers with authors affiliated with pharmaceutical/biotech companies
- Save the results to a CSV file with the required columns

This completes the full implementation of the PubMed Paper Fetcher project according to the requirements.

456d ago

PubMed Paper Fetcher

A command-line tool to fetch research papers from PubMed based on a user-specified query, identifying papers with at least one author affiliated with a pharmaceutical or biotech company.

Features

Search PubMed using full query syntax
Identify papers with authors affiliated with pharmaceutical/biotech companies
Output results as CSV with detailed information
Command-line interface with various options

Installation

Using Poetry (Recommended)

# Clone the repository
git clone https://github.com/yourusername/pubmed-paper-fetcher.git
cd pubmed-paper-fetcher

# Install dependencies using Poetry
poetry install

# Activate the virtual environment
poetry shell

Source of Papers
○ Fetch papers using the PubMed API
○ The program should support PubMed's full query syntax for flexibility.
Output Requirements
○ Return the results as a CSV file with the following columns:
■ PubmedID: Unique identifier for the paper.
■ Title: Title of the paper.
■ Publication Date: Date the paper was published.
■ Non-academic Author(s): Names of authors affiliated with non-academic
institutions.
■ Company Affiliation(s): Names of pharmaceutical/biotech companies.
■ Corresponding Author Email: Email address of the corresponding author.
Command-line Program Features
○ Accept the query as a command-line argument.
○ Provide the following options:
■ -h or --help: Display usage instructions.
■ -d or --debug: Print debug information during execution.
■ -f or --file: Specify the filename to save the results. If this option is not
provided, print the output to the console.
Code Organization and Environment
○ Version Control:
■ Use Git for version control. The code must be hosted on GitHub.
○ Dependencies and Setup:
■ Use Poetry for dependency management and packaging.
■ Ensure that running poetry install sets up all dependencies.
○ Execution:
■ Provide an executable command named get-papers-list via Poetry.
Documentation
○ Include a README.md file with the following details:
■ How the code is organized.
■ Instructions on how to install dependencies and execute the program.
■ Mention any tools (e.g., LLMs or libraries) used to build the program, along with
relevant links.
Evaluation Criteria
○ Functional Requirements:
■ Adherence to the problem statement.
■ Ability to fetch and filter results correctly.
○ Non-functional Requirements:
■ Typed python: Using types everywhere.
■ Performance: Efficiency of API calls and processing.
■ Readability: Clear and maintainable code with appropriate comments and
docstrings.
■ Organization: Logical separation of concerns (e.g., modular functions and
classes).
■ Robustness: Error handling for invalid queries, API failures, or missing data.
Bonus points
Each of these additional points
Break the program into two parts: a module and a command line program that uses the module.
Publish the module in test-pypi.
Notes
● You are free to use LLM tools or other resources to assist in development – please s
● Clearly document any external tools used in the README.md.
● Assume the program will be evaluated by automated scripts, so strict adherence to conventions
is required.
● How to identify non-academic authors? You can apply any heuristics (email addresses, words like
university, labs etc)."

Show full message

456d ago

I'll create a Python program that fetches research papers from PubMed and identifies those with pharmaceutical/biotech company affiliations. Let's build this step by step.

Project Structure

First, let's set up the project structure with Poetry:

PubMed Paper Fetcher

setup.py

Now, let's create the pyproject.toml file for Poetry:

PubMed Paper Fetcher

pyproject.toml

Let's create the core module for fetching papers:

PubMed Paper Fetcher

pubmed_paper_fetcher/pubmed_api.py

Now, let's create the paper processor module:

PubMed Paper Fetcher

pubmed_paper_fetcher/paper_processor.py

Now, let's create the CLI module:

PubMed Paper Fetcher

pubmed_paper_fetcher/cli.py

Let's create the package's __init__.py file:

PubMed Paper Fetcher

pubmed_paper_fetcher/__init__.py

Finally, let's create a README.md file:

PubMed Paper Fetcher

README.md

Using pip

pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git

Usage

# Basic usage
get-papers-list "cancer therapy"

# Save results to a file
get-papers-list "cancer therapy" --file results.csv

# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200

# Enable debug mode
get-papers-list "cancer therapy" --debug

# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com

# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_key

Command-line Options

query: PubMed search query (supports full PubMed query syntax)
-h, --help: Display usage instructions
-d, --debug: Print debug information during execution
-f, --file: Specify the filename to save the results (if not provided, print to console)
-m, --max-results: Maximum number of results to fetch (default: 100)
-e, --email: Email address to identify yourself to NCBI (required)
-k, --api-key: NCBI API key for higher request limits

Code Organization

The project is organized as follows:

pubmed_paper_fetcher/: Main package directory
- __init__.py: Package initialization
- cli.py: Command-line interface
- pubmed_api.py: PubMed API client
- paper_processor.py: Paper processing logic
pyproject.toml: Poetry configuration
README.md: Project documentation

How It Works

The tool queries PubMed using the Entrez API from Biopython
For each paper, it extracts author information and affiliations
It identifies non-academic affiliations using heuristics (keywords, patterns)
Papers with at least one author affiliated with a pharmaceutical/biotech company are included in the results
Results are formatted as a CSV file with the required columns

Tools and Libraries Used

Poetry: Dependency management and packaging
Biopython: Interface to NCBI Entrez API
Requests: HTTP library for API calls
tqdm: Progress bar for long-running operations
Python typing: Type hints throughout the code

Development

Running Tests

poetry run pytest

Code Formatting

poetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcher

Type Checking

poetry run mypy pubmed_paper_fetcher

License

MIT


Let's also create a simple test file:

```py project="PubMed Paper Fetcher" file="tests/test_pubmed_api.py" type="python"
"""
Tests for the PubMed API module.
"""

import unittest
from unittest.mock import patch, MagicMock

from pubmed_paper_fetcher.pubmed_api import PubMedAPI

class TestPubMedAPI(unittest.TestCase):
    """Test cases for the PubMedAPI class."""
    
    def setUp(self):
        """Set up test fixtures."""
        self.api = PubMedAPI(email="test@example.com")
    
    def test_is_non_academic_affiliation_company(self):
        """Test identification of company affiliations."""
        # Test obvious company affiliations
        affiliation = "Pfizer Inc., New York, NY, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertTrue(is_company)
        self.assertEqual(company_name, "Pfizer Inc.")
        
        affiliation = "Genentech, Inc., South San Francisco, CA 94080, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertTrue(is_company)
        self.assertEqual(company_name, "Genentech, Inc.")
    
    def test_is_non_academic_affiliation_academic(self):
        """Test identification of academic affiliations."""
        # Test obvious academic affiliations
        affiliation = "Department of Biology, Stanford University, Stanford, CA, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertFalse(is_company)
        self.assertIsNone(company_name)
        
        affiliation = "Harvard Medical School, Boston, MA, USA"
        is_company, company_name = self.api.is_non_academic_affiliation(affiliation)
        self.assertFalse(is_company)
        self.assertIsNone(company_name)
    
    @patch('pubmed_paper_fetcher.pubmed_api.Entrez')
    def test_search_papers(self, mock_entrez):
        """Test searching for papers."""
        # Mock Entrez.esearch and Entrez.read
        mock_handle = MagicMock()
        mock_entrez.esearch.return_value = mock_handle
        mock_entrez.read.return_value = {"IdList": ["12345", "67890"]}
        
        # Call the method
        result = self.api.search_papers("cancer therapy", max_results=10)
        
        # Verify the result
        self.assertEqual(result, ["12345", "67890"])
        
        # Verify Entrez.esearch was called with the correct parameters
        mock_entrez.esearch.assert_called_once_with(
            db="pubmed", term="cancer therapy", retmax=10
        )

if __name__ == "__main__":
    unittest.main()

How to Use the Tool

Install the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git cd pubmed-paper-fetcher poetry install poetry shell
Run the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csv
The tool will:
- Search PubMed for papers matching your query
- Identify papers with authors affiliated with pharmaceutical/biotech companies
- Save the results to a CSV file with the required columns

Key Features

Comprehensive PubMed Search: Supports full PubMed query syntax
Smart Affiliation Detection: Uses heuristics to identify pharmaceutical and biotech company affiliations
Detailed Output: Provides comprehensive information including PubmedID, title, publication date, non-academic authors, company affiliations, and corresponding author email
Configurable: Command-line options for customizing search parameters and output
Rate Limiting: Implements proper rate limiting to comply with NCBI's API usage policies

Implementation Details

Identifying Non-Academic Affiliations

The tool uses a combination of techniques to identify non-academic affiliations:

Keyword Analysis: Looks for company-specific keywords (Inc, Ltd, GmbH, Pharma, Biotech, etc.)
Negative Matching: Excludes affiliations containing academic keywords (University, Hospital, Institute, etc.)
Pattern Recognition: Analyzes affiliation structure to identify company names

Error Handling

The tool implements robust error handling:

API request failures
Rate limiting and retries
Invalid or missing data
User interruptions

Performance Considerations

Batch processing of PubMed IDs
Progress tracking for long-running operations
Efficient memory usage for large result sets

Bonus Implementation

As a bonus, the package is structured as a reusable module with a separate CLI interface. This allows it to be:

Imported and used programmatically in other Python applications
Run directly from the command line
Extended with additional functionality

To publish to test-pypi, run:

poetry config repositories.test-pypi https://test.pypi.org/legacy/
poetry build
poetry publish -r test-pypi

Future Improvements

Potential enhancements for future versions:

Improved company name extraction using NLP techniques
Support for additional metadata fields
Integration with other bibliographic databases
Enhanced filtering options
Web interface or API endpoint

give me full code

456d ago

Here's the complete code for the PubMed Paper Fetcher project:

1. Project Setup Files

pyproject.toml

PubMed Paper Fetcher

pyproject.toml

setup.py

PubMed Paper Fetcher

setup.py

2. Package Files

init.py

PubMed Paper Fetcher

pubmed_paper_fetcher/__init__.py

pubmed_api.py

PubMed Paper Fetcher

pubmed_paper_fetcher/pubmed_api.py

paper_processor.py

PubMed Paper Fetcher

pubmed_paper_fetcher/paper_processor.py

cli.py

PubMed Paper Fetcher

pubmed_paper_fetcher/cli.py

3. Test Files

test_pubmed_api.py

PubMed Paper Fetcher

tests/test_pubmed_api.py

test_paper_processor.py

PubMed Paper Fetcher

tests/test_paper_processor.py

test_cli.py

PubMed Paper Fetcher

tests/test_cli.py

4. Documentation

README.md

PubMed Paper Fetcher

README.md

Using pip

pip install git+https://github.com/yourusername/pubmed-paper-fetcher.git

Usage

# Basic usage
get-papers-list "cancer therapy"

# Save results to a file
get-papers-list "cancer therapy" --file results.csv

# Specify maximum number of results
get-papers-list "cancer therapy" --max-results 200

# Enable debug mode
get-papers-list "cancer therapy" --debug

# Provide your email (required by NCBI)
get-papers-list "cancer therapy" --email your.email@example.com

# Use NCBI API key for higher request limits
get-papers-list "cancer therapy" --api-key your_api_key

Command-line Options

query: PubMed search query (supports full PubMed query syntax)
-h, --help: Display usage instructions
-d, --debug: Print debug information during execution
-f, --file: Specify the filename to save the results (if not provided, print to console)
-m, --max-results: Maximum number of results to fetch (default: 100)
-e, --email: Email address to identify yourself to NCBI (required)
-k, --api-key: NCBI API key for higher request limits

Code Organization

The project is organized as follows:

pubmed_paper_fetcher/: Main package directory
- __init__.py: Package initialization
- cli.py: Command-line interface
- pubmed_api.py: PubMed API client
- paper_processor.py: Paper processing logic
pyproject.toml: Poetry configuration
README.md: Project documentation

How It Works

The tool queries PubMed using the Entrez API from Biopython
For each paper, it extracts author information and affiliations
It identifies non-academic affiliations using heuristics (keywords, patterns)
Papers with at least one author affiliated with a pharmaceutical/biotech company are included in the results
Results are formatted as a CSV file with the required columns

Tools and Libraries Used

Poetry: Dependency management and packaging
Biopython: Interface to NCBI Entrez API
Requests: HTTP library for API calls
tqdm: Progress bar for long-running operations
Python typing: Type hints throughout the code

Development

Running Tests

poetry run pytest

Code Formatting

poetry run black pubmed_paper_fetcher
poetry run isort pubmed_paper_fetcher

Type Checking

poetry run mypy pubmed_paper_fetcher

License

MIT


## 5. Publishing to test-pypi

To publish the package to test-pypi, you would run the following commands:

```bash
# Configure Poetry to use test-pypi
poetry config repositories.test-pypi https://test.pypi.org/legacy/

# Build the package
poetry build

# Publish to test-pypi
poetry publish -r test-pypi

How to Use the Tool

Install the package using Poetry:
git clone https://github.com/yourusername/pubmed-paper-fetcher.git cd pubmed-paper-fetcher poetry install poetry shell
Run the command-line tool:
get-papers-list "cancer therapy" --email your.email@example.com --file results.csv
The tool will:
- Search PubMed for papers matching your query
- Identify papers with authors affiliated with pharmaceutical/biotech companies
- Save the results to a CSV file with the required columns

This completes the full implementation of the PubMed Paper Fetcher project according to the requirements.

456d ago

PubMed Paper Fetcher

A command-line tool to fetch research papers from PubMed based on a user-specified query, identifying papers with at least one author affiliated with a pharmaceutical or biotech company.

Features

Search PubMed using full query syntax
Identify papers with authors affiliated with pharmaceutical/biotech companies
Output results as CSV with detailed information
Command-line interface with various options

Installation

Using Poetry (Recommended)

# Clone the repository
git clone https://github.com/yourusername/pubmed-paper-fetcher.git
cd pubmed-paper-fetcher

# Install dependencies using Poetry
poetry install

# Activate the virtual environment
poetry shell