Galaxy Community Hub

Using Claude AI for Literature Searches

Claude Survey Results

Introduction

Finding all relevant literature for a research project is crucial but challenging. While AI assistants like Claude and ChatGPT promise to streamline this process, how effective are they really? I conducted a systematic experiment using multiple AI-assisted approaches to survey RNA-seq studies on Candida auris, a dangerous multidrug-resistant fungal pathogen. The results revealed both the power and limitations of AI-assisted literature searches.

The Workflow: A Multi-Strategy Approach

Step 1: Initial Claude Survey

I started by asking Claude to perform a comprehensive literature survey of RNA-seq studies on Candida auris published since 2020. Claude conducted searches across:

  • PubMed and PubMed Central - Standard biomedical literature databases
  • Europe PMC - European alternative with different indexing
  • Repository Analysis - NCBI BioProject and GEO databases

Claude identified 16 papers and provided detailed information including:

  • Genome reference versions used
  • Bioinformatics tools and pipelines
  • Experimental designs and research questions
  • Full-text extraction of methodological details

Claude Survey Results Figure 1: Claude’s initial survey revealed 16 RNA-seq studies with detailed tool usage and research focus analysis

Step 2: ChatGPT Comparison

To test whether a single AI assistant captures all relevant literature, I performed the same search using ChatGPT. Remarkably, ChatGPT found 9 different papers while searching the same databases (PubMed and Europe PMC) on the same day.

The striking finding: ZERO overlap between Claude and ChatGPT results!

Despite both AI assistants searching identical databases, they found completely different sets of papers. This revealed that:

  • Search query formulation critically affects results
  • Different AI systems have distinct search strategies and biases
  • No single AI tool provides comprehensive coverage
  • The two approaches were complementary, not redundant

Finally, I uploaded a list of GEO accessions corresponding to C. auris (TaxID: txid498019) and asked Claude to identify all associated papers. This GEO-based search uncovered 11 studies, including 7 unique papers not found by either AI literature search.

These GEO-exclusive papers included:

  • High-impact publications in Nature Microbiology and Nature Communications
  • Foundational 2018 studies establishing key methodologies
  • Papers where RNA-seq was a supporting technique rather than the primary focus
  • Studies using novel approaches (dual-species RNA-seq, single-cell, QuantSeq)

Step 4: Combined Analysis

The final step involved merging all three search strategies and creating comprehensive visualizations and statistical analyses. Claude integrated:

  • All 32 unique papers identified across the three approaches
  • Temporal trends and research focus analysis
  • Tool standardization and consensus pipelines
  • Source overlap analysis

Key Results

The Numbers Tell a Powerful Story

Search StrategyPapers FoundUnique Contribution
Claude AI survey1614 unique papers (43.8%)
ChatGPT survey97 unique papers (21.9%)
GEO database117 unique papers (21.9%)
Combined Total32100% more than any single method

Combined Overview Figure 2: Comprehensive overview showing papers by year, source distribution, genome usage, and research focus across all three search strategies

Critical Insights

1. Multiple AI Assistants Are Essential

The zero overlap between Claude and ChatGPT despite searching the same databases demonstrates that query formulation and search strategy matter enormously. Different AI systems:

  • Use different keyword combinations
  • Apply different relevance ranking algorithms
  • Have distinct selection biases (Claude emphasized drug resistance; ChatGPT was more diverse)
  • Access full-text differently (affecting verification and detail extraction)

2. Repository Searches Complement Literature Searches

The GEO database search found 22% of total unique papers, including:

  • Papers where RNA-seq was secondary methodology
  • High-impact studies that don’t emphasize sequencing in titles/abstracts
  • Studies with public data deposition requirements
  • Foundational early work establishing methodologies

3. Combined Approach Doubles Coverage

Using all three strategies provided 100% more papers than any single approach:

  • Single best method (Claude): 16 papers
  • Combined approach: 32 papers
  • Coverage improvement: +100%

This wasn’t due to redundancy - the searches were remarkably complementary with minimal overlap.

Combined Analysis Figure 3: Detailed analysis showing temporal trends, drug resistance studies over time, source composition, and research focus distribution

Complete Literature Table

The table below shows all 32 unique papers identified through the three search strategies:

PubMed IDYearFound ByGEO/BioProjectGenomeType of Study
299971212018GEOPRJNA477447B8441De novo transcriptome: biofilm development
305593692018GEOPRJNA445471B8441/B11221Multidrug resistance
325810782020Claude-N/ABiofilm vs. planktonic
328395382020GEOGSE154911Human hg38Host PBMC response (QuantSeq)
330776642020GEOGSE136768B8441Fluconazole resistance aneuploidy
339371022021Claude/GEOGSE165762B11221Clinical isolate transcriptome signatures
339833152021ChatGPT-B8441Farnesol exposure
339954732021ChatGPT-B8441Transcriptome signatures
340837692021GEOGSE171261B8441LncRNA DINOR stress regulator
343546952021Claude-N/ADrug resistance China
344621772021ChatGPT-B8441Global stress responses
344854702021Claude-GCA_002759435Farnesol response
346309442021Claude-B8441 V2Caspofungin translational profiling
346434212021GEOGSE180093B8441Farnesol exposure
347789242021ChatGPT-B8441Caspofungin proteomics
347884382021Claude-B8441 V2Small RNA-seq: extracellular vesicles
351425972022GEOGSE179000B8441 + HumanDual-species: whole blood infection
356490812022ChatGPT-B8441/B11221Adhesin mutants
356523072022Claude/GEOGSE190920B8441AmB resistance
359689562022Claude-B8441Echinocandin resistance
369134082023Claude-GCA_002759435.2ALS4 amplification biofilm
373507812023Claude-B11221Rough vs. smooth morphotypes
375329702023GEOGSE223953B8441Tyrosol exposure planktonic
375484692023ChatGPT-Isolate 12Tyrosol exposure
377690842023ClaudePRJNA904261B8441 V3SCF1 adhesin (Science)
379250282025ChatGPT-B8441White-brown switching
384409722024GEOPRJNA792028B8441Farnesol/tyrosol biofilms
385376182024ChatGPT-B8441/Isolate 12Farnesol/tyrosol biofilms
385627582024ClaudePRJNA1086003GCA_002759435Adhesin redundancy (Nat Commun)
387456372024ChatGPT-B8441Single-cell RNA-seq: immune evasion
389904362024Claude-N/AHost dermal cells ferroptosis
PMC113856382024Claude-B11221AmB microevolution
PMC114599302024Claude-B8441Pan-drug resistance
400999082025ClaudeGSE272878B8441Flucytosine resistance SNP calling

Key observations from the table:

  • B8441 genome dominance: 75% of studies use the B8441 (Clade I) reference
  • Peak year 2021: 11 papers (34.4% of total)
  • Source diversity: Each search strategy contributed unique papers
  • High-impact publications: Includes Science and Nature Communications papers
  • Methodological diversity: From de novo assembly to single-cell RNA-seq

Research Insights Gained

Beyond methodology, the comprehensive survey revealed important scientific trends:

Emerging Consensus Pipeline

  • HISAT2 (62.5% of studies) - dominant aligner
  • DESeq2 (68.8% of studies) - gold standard for differential expression
  • B8441 reference genome (75% of studies) - standard reference
  • Standard workflow: FastQC → HISAT2 → HTSeq/featureCounts → DESeq2

Research Focus

  • Drug resistance (34.4%) - reflecting urgent clinical threat
  • Stress responses (18.8%)
  • Biofilm formation (12.5%)
  • Host-pathogen interactions (12.5%)

Temporal Evolution

  • 2018-2020: Foundational studies, method establishment
  • 2021: Peak year with 11 papers (34.4% of total)
  • 2022-2025: Specialization, advanced approaches (single-cell, pan-drug resistance)

Lessons Learned: Best Practices for AI-Assisted Literature Searches

DO:

  1. Use multiple AI assistants - Claude and ChatGPT together found 56% more papers than either alone
  2. Search multiple databases - PubMed, Europe PMC, GEO, BioProject complement each other
  3. Check data repositories - GEO/SRA capture papers missed by keyword searches
  4. Verify full-text when possible - Abstracts may miss or mischaracterize methodology
  5. Vary search terms - “RNA-seq” vs “transcriptome” vs “differential expression” yield different results
  6. Combine approaches - Literature searches + repository mining + citation tracking

DON’T:

  • Rely on a single AI assistant or database
  • Assume “same database” means “same results”
  • Trust that one comprehensive search captures everything
  • Overlook papers where your method is secondary
  • Skip manual verification and deduplication

Conclusions

This experiment demonstrated that AI assistants like Claude are powerful tools for literature searches, but they have important limitations:

Strengths:

  • Rapid, systematic searches across multiple databases
  • Detailed information extraction from full-text articles
  • Comprehensive analysis and visualization
  • Reproducible search strategies
  • Integration of diverse data sources

Limitations:

  • No single AI tool is comprehensive
  • Search strategy and query formulation critically matter
  • Different systems have different biases and blind spots
  • Repository searches still require manual guidance

The Bottom Line: For comprehensive literature reviews, use multiple AI assistants with different search strategies, then merge and manually curate the results. In this study, combining Claude, ChatGPT, and GEO database searches uncovered 100% more papers than the best single approach.

Reproducibility

All search strategies, data extraction methods, analysis scripts, and visualizations are documented in the project repository. The combined analysis identified 32 unique Candida auris RNA-seq studies from 2018-2025, providing a comprehensive foundation for future research in this area.


About this work: This analysis was performed as part of RNA-seq methodology research on Candida auris, demonstrating best practices for AI-assisted literature review. All source data, analysis scripts, and detailed methodology are available in the project documentation.