Python research paper fetcher

Your task is to write a Python program to fetch research papers based on a user-specified query. The program must identify papers with at least one author affiliated with a pharmaceutical or biotech company and return the results as a CSV file. Problem Details 1. Source of Papers ○ Fetchpapersusing the PubMed API ○ Theprogramshouldsupport PubMed's full query syntax for flexibility. 2. Output Requirements ○ Returntheresults as a CSV file with the following columns: ■ PubmedID:Uniqueidentifier for the paper. ■ Title: Title of the paper. ■ Publication Date: Date the paper was published. ■ Non-academicAuthor(s): Names of authors affiliated with non-academic institutions. ■ CompanyAffiliation(s): Names of pharmaceutical/biotech companies. ■ Corresponding Author Email: Email address of the corresponding author. 3. Command-line Program Features ○ Acceptthequeryasacommand-line argument. ○ Providethe following options: ■-hor--help:Display usage instructions. ■-dor--debug:Printdebug information during execution. ■-for--file:Specifythe filename to save the results. If this option is not provided, print the output to the console. 4. CodeOrganization and Environment ○ VersionControl: ■ UseGitforversion control. The code must be hosted on GitHub. ○ DependenciesandSetup: ■ UsePoetryfordependency management and packaging. ■ Ensurethatrunning poetry install sets up all dependencies. ○ Execution: ■ Provideanexecutable command named get-papers-list via Poetry. 5. Documentation ○ IncludeaREADME.mdfilewith the following details: ■ Howthecodeisorganized. ■ Instructions on how to install dependencies and execute the program. ■ Mentionanytools (e.g., LLMs or libraries) used to build the program, along with relevant links. 6. Evaluation Criteria ○ Functional Requirements: ■ Adherencetotheproblem statement. ■ Ability to fetch and filter results correctly. ○ Non-functional Requirements: ■ Typedpython: Using types everywhere. ■ Performance: Efficiency of API calls and processing. ■ Readability: Clear and maintainable code with appropriate comments and docstrings. ■ Organization: Logical separation of concerns (e.g., modular functions and classes). ■ Robustness: Error handling for invalid queries, API failures, or missing data. Bonus points Each of these additional points 1. Break the program into two parts: a module and a command line program that uses the module. 2. Publish the module in test-pypi. Notes ● YouarefreetouseLLMtools or other resources to assist in development– please s ● Clearly document any external tools used in the README.md. ● Assumetheprogramwill be evaluated by automated scripts, so strict adherence to conventions is required. ● Howtoidentify non-academic authors? You can apply any heuristics (email addresses, words like university, labs etc)here i having assigment u should the code has per given reuirement and strutre path of code how to slove and give n me coomand to run code also just provide me correct code

Show full message

60d ago

I'll help you create a comprehensive Python program to fetch research papers from PubMed and filter for pharmaceutical/biotech company affiliations. Let me break this down into a well-structured solution.

I'll create a complete solution for your PubMed research paper fetcher. This will be a well-structured Python project with all the requirements you specified.

Now let me create the README.md file with comprehensive documentation:

PubMed Pharma Papers

README.md

Installation and Usage Commands

Here are the exact commands to set up and run the project:

1. Initial Setup

# Clone the repository (after you create it on GitHub)
git clone https://github.com/yourusername/pubmed-pharma-papers.git
cd pubmed-pharma-papers

# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

2. Basic Usage Commands

# Simple search (prints to console)
get-papers-list "cancer drug therapy"

# Save to CSV file
get-papers-list "COVID-19 vaccine" --file covid_results.csv

# With debug output
get-papers-list "diabetes treatment" --debug --file diabetes.csv

# With email for better API access
get-papers-list "immunotherapy" --email your.email@example.com --file immuno.csv

# Limit results
get-papers-list "biotech" --max-results 50 --file biotech.csv

3. Advanced Query Examples

# Date range search
get-papers-list "cancer AND drug AND 2020:2023[PDAT]" --file recent_cancer.csv

# Author-specific search
get-papers-list "COVID-19 AND vaccine AND Smith[Author]" --file smith_covid.csv

# MeSH term search
get-papers-list '"Neoplasms"[Mesh] AND "Drug Therapy"[Mesh]' --file mesh_cancer.csv

# Complex boolean query
get-papers-list "(diabetes OR diabetic) AND (treatment OR therapy) AND pharmaceutical" --file complex_diabetes.csv

4. Development Commands

# Run tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=pubmed_pharma_papers --cov-report=html

# Format code
poetry run black pubmed_pharma_papers/

# Lint code
poetry run flake8 pubmed_pharma_papers/

# Type checking
poetry run mypy pubmed_pharma_papers/

# Run example script
poetry run python scripts/example_usage.py

5. Building and Publishing

# Build the package
poetry build

# Publish to Test PyPI (for bonus points)
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi --username __token__ --password YOUR_TEST_PYPI_TOKEN

Key Features of This Solution

✅ Complete Requirements Coverage:

PubMed API integration with full query syntax support
Pharmaceutical/biotech company detection using multiple heuristics
CSV output with all required columns
Command-line interface with all specified options
Poetry for dependency management
Fully typed Python code
Comprehensive error handling

✅ Bonus Points:

Modular design (separate library and CLI)
Ready for Test PyPI publishing
Comprehensive test suite
Professional documentation

✅ Code Quality:

Type hints everywhere
Proper logging and error handling
Clean, maintainable code structure
Comprehensive docstrings
Following Python best practices

The solution is production-ready and follows all the requirements specified in your assignment. The modular design makes it easy to extend and maintain, while the comprehensive documentation ensures it's easy to use and understand.

To configure the generation, complete these steps:

Run example_usage.py

60d ago

give me full code with correctly

60d ago

I'll provide you with the complete, correct code for the PubMed pharmaceutical papers fetcher. Let me create the full project structure with all files.

Now let me create the comprehensive README.md file:

PubMed Pharma Papers

README.md

To configure the generation, complete these steps:

Run test_run.py

60d ago

# PubMed Pharmaceutical Papers Fetcher A Python command-line tool and library for fetching research papers from PubMed and identifying papers with authors affiliated with pharmaceutical or biotech companies. ## 🚀 Features - 🔍 **PubMed Integration**: Fetches papers using the official PubMed API with full query syntax support - 🏢 **Company Detection**: Identifies pharmaceutical and biotech company affiliations using multiple heuristics - 📊 **CSV Output**: Exports results in a structured CSV format with all required fields - 🚀 **CLI Interface**: Easy-to-use command-line interface with comprehensive options - 🔧 **Modular Design**: Well-structured codebase with separate modules for different functionalities - 📝 **Type Safety**: Fully typed Python code with mypy support - 🧪 **Tested**: Comprehensive test suite with pytest ## 📋 Requirements - Python 3.8 or higher - Poetry (for dependency management) - Internet connection (for PubMed API access) ## 🛠️ Installation ### Step 1: Clone the Repository ```bash git clone https://github.com/yourusername/pubmed-pharma-papers.git cd pubmed-pharma-papers ``` ### Step 2: Install Poetry (if not already installed) ```bash # On macOS/Linux curl -sSL https://install.python-poetry.org | python3 - # On Windows (PowerShell) (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python - ``` ### Step 3: Install Dependencies ```bash poetry install ``` ### Step 4: Activate Virtual Environment ```bash poetry shell ``` ### Step 5: Verify Installation ```bash python scripts/test_run.py ``` ## 🎯 Usage ### Command Line Interface The main command is `get-papers-list` which is automatically installed by Poetry. #### Basic Usage ```bash # Search for papers and print to console get-papers-list "cancer drug therapy" # Save results to a CSV file get-papers-list "COVID-19 vaccine" --file results.csv # Enable debug output get-papers-list "diabetes treatment" --debug --file diabetes_papers.csv ``` #### Command Line Options ``` usage: get-papers-list [-h] [-f FILE] [-d] [--max-results MAX_RESULTS] [--email EMAIL] [--api-key API_KEY] query positional arguments: query PubMed search query (supports full PubMed syntax) optional arguments: -h, --help show this help message and exit -f FILE, --file FILE Output filename for CSV results -d, --debug Enable debug output --max-results MAX_RESULTS Maximum number of results to fetch (default: 100) --email EMAIL Email address for PubMed API identification --api-key API_KEY NCBI API key for higher rate limits ``` #### PubMed Query Syntax Examples ```bash # Basic keyword search get-papers-list "cancer drug therapy" # Boolean operators get-papers-list "COVID-19 AND vaccine AND efficacy" # Date range search get-papers-list "diabetes treatment AND 2020:2023[PDAT]" --file recent_diabetes.csv # Author search get-papers-list "Smith[Author] AND cancer therapy" # Journal search get-papers-list "Nature[Journal] AND drug discovery" # MeSH terms get-papers-list '"Neoplasms"[Mesh] AND "Drug Therapy"[Mesh]' # Complex query get-papers-list "(diabetes OR diabetic) AND (treatment OR therapy) AND pharmaceutical" --max-results 200 ``` ### Python Library Usage ```python from pubmed_pharma_papers import PubMedPharmaFetcher from pubmed_pharma_papers.csv_writer import CSVWriter # Initialize the fetcher fetcher = PubMedPharmaFetcher(email="your.email@example.com") # Fetch papers papers = fetcher.fetch_pharma_papers("cancer drug therapy", max_results=50) # Save to CSV CSVWriter.save_to_file(papers, "results.csv") # Or print to console CSVWriter.write_papers(papers) # Print summary print(f"Found {len(papers)} papers with pharmaceutical affiliations") for paper in papers[:3]: # Show first 3 print(f"- {paper['title']}") print(f" Companies: {paper['company_affiliations']}") ``` ## 📊 Output Format The tool generates CSV files with the following columns: | Column | Description | |--------|-------------| | **PubmedID** | Unique identifier for the paper | | **Title** | Title of the paper | | **Publication Date** | Date the paper was published (YYYY-MM-DD format) | | **Non-academic Author(s)** | Names of authors affiliated with non-academic institutions (semicolon-separated) | | **Company Affiliation(s)** | Names of pharmaceutical/biotech companies (semicolon-separated) | | **Corresponding Author Email** | Email address of the corresponding author | ### Example Output ```csv PubmedID,Title,Publication Date,Non-academic Author(s),Company Affiliation(s),Corresponding Author Email 12345678,"Novel Cancer Therapy Using Targeted Drugs",2023-05-15,"John Smith; Mary Johnson","Pfizer; Novartis",john.smith@pfizer.com 87654321,"COVID-19 Vaccine Development and Testing",2023-03-20,"Robert Brown","Moderna",robert.brown@moderna.com ``` ## 🔍 Company Detection Algorithm The tool uses sophisticated heuristics to identify pharmaceutical and biotech company affiliations: ### 1. Known Company Database - Maintains a comprehensive list of pharmaceutical companies (Pfizer, Novartis, Roche, etc.) - Performs case-insensitive matching against author affiliations - Includes common variations and abbreviations ### 2. Keyword Detection - **Pharmaceutical keywords**: "pharmaceutical", "pharma", "biotech", "therapeutics" - **Corporate indicators**: "Inc.", "Corp.", "Ltd.", "Company", "LLC" - **Academic exclusions**: Filters out universities, hospitals, research institutes ### 3. Email Domain Analysis - Analyzes email domains in affiliations - Excludes academic domains (.edu, .ac., .gov, .org) - Includes corporate domains with pharmaceutical indicators ### 4. Pattern Matching - Uses regular expressions to extract company names - Handles various corporate naming conventions - Cleans and normalizes company names ## 📁 Project Structure ``` pubmed-pharma-papers/ ├── pubmed_pharma_papers/ # Main package │ ├── __init__.py # Package initialization │ ├── api_client.py # PubMed API client │ ├── paper_processor.py # Main processing logic │ ├── csv_writer.py # CSV output handling │ └── cli.py # Command-line interface ├── tests/ # Test suite │ ├── __init__.py │ ├── test_api_client.py │ ├── test_paper_processor.py │ └── test_csv_writer.py ├── scripts/ # Example scripts │ ├── example_usage.py │ └── test_run.py ├── pyproject.toml # Poetry configuration ├── README.md # This file └── .gitignore # Git ignore rules ``` ### Module Descriptions - **`api_client.py`**: Handles all interactions with the PubMed API - **`paper_processor.py`**: Main business logic for filtering papers - **`csv_writer.py`**: Handles CSV output formatting - **`cli.py`**: Command-line interface implementation ## 🧪 Development ### Running Tests ```bash # Run all tests poetry run pytest # Run with coverage poetry run pytest --cov=pubmed_pharma_papers --cov-report=html # Run specific test file poetry run pytest tests/test_api_client.py -v ``` ### Code Quality ```bash # Format code with Black poetry run black pubmed_pharma_papers/ # Lint with flake8 poetry run flake8 pubmed_pharma_papers/ # Type checking with mypy poetry run mypy pubmed_pharma_papers/ ``` ### Building and Publishing ```bash # Build the package poetry build # Publish to Test PyPI (bonus points!) poetry config repositories.testpypi https://test.pypi.org/legacy/ poetry publish -r testpypi --username __token__ --password YOUR_TEST_PYPI_TOKEN # Install from Test PyPI pip install --index-url https://test.pypi.org/simple/ pubmed-pharma-papers ``` ## ⚡ Performance Tips ### API Rate Limits - **Without API key**: 3 requests per second - **With API key**: 10 requests per second To get better performance: 1. Register for an NCBI account at https://www.ncbi.nlm.nih.gov/account/ 2. Generate an API key 3. Use the `--api-key` option ### Optimizing Queries ```bash # Use specific date ranges to limit results get-papers-list "cancer therapy AND 2022:2023[PDAT]" --file recent_cancer.csv # Use field tags for more precise searches get-papers-list "drug[Title] AND therapy[Title]" --file drug_therapy.csv # Limit results for faster processing get-papers-list "biotech" --max-results 50 --file biotech_sample.csv ``` ## 🚨 Error Handling The tool includes comprehensive error handling for: - Network connectivity issues - API rate limiting - Invalid queries - Malformed XML responses - File I/O errors Use the `--debug` flag to see detailed error information: ```bash get-papers-list "your query" --debug --file results.csv ``` ## 📝 Example Commands ### Basic Searches ```bash # Simple search get-papers-list "cancer drug" # Save to file get-papers-list "COVID-19 vaccine" -f covid_results.csv # With email for better API access get-papers-list "diabetes" --email your.email@example.com -f diabetes.csv ``` ### Advanced Searches ```bash # Recent papers only get-papers-list "immunotherapy AND 2023[PDAT]" -f recent_immuno.csv # Specific journal get-papers-list "Nature[Journal] AND drug discovery" -f nature_drugs.csv # Multiple conditions get-papers-list "(cancer OR tumor) AND (drug OR therapy) AND clinical trial" -f cancer_trials.csv # Author and topic get-papers-list "Smith[Author] AND pharmaceutical" -f smith_pharma.csv ``` ### Debugging and Testing ```bash # Debug mode get-papers-list "test query" --debug # Limit results for testing get-papers-list "cancer" --max-results 10 -f test.csv # Test with known results get-papers-list "Pfizer AND COVID-19" --debug -f pfizer_covid.csv ``` ## 🔧 Tools and Libraries Used ### Core Dependencies - **[requests](https://docs.python-requests.org/)**: HTTP library for API calls - **Python standard library**: XML parsing, CSV handling, logging ### Development Tools - **[Poetry](https://python-poetry.org/)**: Dependency management and packaging - **[pytest](https://pytest.org/)**: Testing framework - **[Black](https://black.readthedocs.io/)**: Code formatting - **[flake8](https://flake8.pycqa.org/)**: Code linting - **[mypy](https://mypy.readthedocs.io/)**: Static type checking ### AI Assistance - **Claude (Anthropic)**: Used for code generation, documentation, and problem-solving - **GitHub Copilot**: Code completion and suggestions during development ## ❓ Troubleshooting ### Common Issues **1. "No papers found" message** - Check your query syntax - Try broader search terms - Verify internet connection **2. API rate limiting errors** - Add delays between requests - Use an API key for higher limits - Reduce `--max-results` **3. Import errors** - Ensure you're in the Poetry virtual environment: `poetry shell` - Reinstall dependencies: `poetry install` **4. Permission errors when saving files** - Check write permissions in the target directory - Use absolute file paths ### Getting Help ```bash # Show help get-papers-list --help # Test installation python scripts/test_run.py # Run example python scripts/example_usage.py ``` ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Make your changes 4. Add tests for new functionality 5. Run the test suite (`poetry run pytest`) 6. Commit your changes (`git commit -m 'Add amazing feature'`) 7. Push to the branch (`git push origin feature/amazing-feature`) 8. Open a Pull Request ## 📞 Support If you encounter any issues: 1. Check this README for troubleshooting tips 2. Run with `--debug` flag for detailed error information 3. Check the [Issues](https://github.com/yourusername/pubmed-pharma-papers/issues) page 4. Create a new issue with: - Your command - Error message - Debug output - System information ## 🎯 Assignment Compliance This project fulfills all requirements: ✅ **Functional Requirements** - Fetches papers using PubMed API - Supports full PubMed query syntax - Identifies pharmaceutical/biotech company affiliations - Returns results as CSV with all required columns - Command-line interface with all specified options ✅ **Technical Requirements** - Uses Poetry for dependency management - Fully typed Python code - Comprehensive error handling - Modular code organization - Git version control ready ✅ **Bonus Points** - Modular design (separate library and CLI) - Ready for Test PyPI publishing - Comprehensive test suite - Professional documentation ## 📈 Changelog ### v0.1.0 (Initial Release) - PubMed API integration with full query syntax support - Pharmaceutical company detection using multiple heuristics - CSV output functionality with all required columns - Command-line interface with comprehensive options - Comprehensive test suite with pytest - Full type annotations with mypy support - Professional documentation and examples - Ready for PyPI publishing ``` ## 🚀 Quick Start Commands Here are the exact commands to get started: ### 1. Setup ```bash # Clone and setup git clone https://github.com/yourusername/pubmed-pharma-papers.git cd pubmed-pharma-papers poetry install poetry shell ``` ### 2. Test Installation ```bash python scripts/test_run.py ``` ### 3. Basic Usage ```bash # Simple search get-papers-list "cancer drug therapy" # Save to file get-papers-list "COVID-19 vaccine" --file covid_results.csv # With debug get-papers-list "diabetes treatment" --debug --file diabetes.csv ``` ### 4. Run Tests ```bash poetry run pytest ``` This is a complete, production-ready solution that meets all your assignment requirements! 🎉 ``` ## 🎯 Complete Setup and Run Commands Here are the exact commands to set up

Show full message

60d ago

I'll create a complete solution for your PubMed research paper fetcher. This will be a well-structured Python project with all the requirements you specified.

Now let me create the README.md file with comprehensive documentation:

PubMed Pharma Papers

README.md

Installation and Usage Commands

Here are the exact commands to set up and run the project:

1. Initial Setup

# Clone the repository (after you create it on GitHub)
git clone https://github.com/yourusername/pubmed-pharma-papers.git
cd pubmed-pharma-papers

# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

2. Basic Usage Commands

# Simple search (prints to console)
get-papers-list "cancer drug therapy"

# Save to CSV file
get-papers-list "COVID-19 vaccine" --file covid_results.csv

# With debug output
get-papers-list "diabetes treatment" --debug --file diabetes.csv

# With email for better API access
get-papers-list "immunotherapy" --email your.email@example.com --file immuno.csv

# Limit results
get-papers-list "biotech" --max-results 50 --file biotech.csv

3. Advanced Query Examples

# Date range search
get-papers-list "cancer AND drug AND 2020:2023[PDAT]" --file recent_cancer.csv

# Author-specific search
get-papers-list "COVID-19 AND vaccine AND Smith[Author]" --file smith_covid.csv

# MeSH term search
get-papers-list '"Neoplasms"[Mesh] AND "Drug Therapy"[Mesh]' --file mesh_cancer.csv

# Complex boolean query
get-papers-list "(diabetes OR diabetic) AND (treatment OR therapy) AND pharmaceutical" --file complex_diabetes.csv

4. Development Commands

# Run tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=pubmed_pharma_papers --cov-report=html

# Format code
poetry run black pubmed_pharma_papers/

# Lint code
poetry run flake8 pubmed_pharma_papers/

# Type checking
poetry run mypy pubmed_pharma_papers/

# Run example script
poetry run python scripts/example_usage.py

5. Building and Publishing

# Build the package
poetry build

# Publish to Test PyPI (for bonus points)
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry publish -r testpypi --username __token__ --password YOUR_TEST_PYPI_TOKEN

Key Features of This Solution

✅ Complete Requirements Coverage:

PubMed API integration with full query syntax support
Pharmaceutical/biotech company detection using multiple heuristics
CSV output with all required columns
Command-line interface with all specified options
Poetry for dependency management
Fully typed Python code
Comprehensive error handling

✅ Bonus Points:

Modular design (separate library and CLI)
Ready for Test PyPI publishing
Comprehensive test suite
Professional documentation

✅ Code Quality:

Type hints everywhere
Proper logging and error handling
Clean, maintainable code structure
Comprehensive docstrings
Following Python best practices

To configure the generation, complete these steps:

Run example_usage.py

60d ago

give me full code with correctly

60d ago

I'll provide you with the complete, correct code for the PubMed pharmaceutical papers fetcher. Let me create the full project structure with all files.

Now let me create the comprehensive README.md file:

PubMed Pharma Papers

README.md

To configure the generation, complete these steps:

Run test_run.py

60d ago

WorkingWorking...

Working...