GeneCards Knowledge Graph
200+ Sources. One Graph.
The biomedical data integration you'd spend years building — ready to query today.
What's in the Graph
Core Entities
Genes, proteins, compounds, disorders.
Clinical & Genomic
Variants, trials, phenotypes, GWAS.
Functional & Structural
Pathways, enzymes, domains, PTMs.
Expression & Evolution
Tissue expression, orthologs, enhancers.
Summaries & AI
AI-generated insights and publications.
Cross-References
50 namespaces, universal identifiers.
One Identifier.
The Entire Research Universe.
Every identifier in your data is a gateway.
Enter at any point — traverse the full biomedical landscape.
Connect Your Data
CAS numbers, DrugBank IDs, ICD-10 codes, rsNumbers, Ensembl IDs, HGNC symbols, ClinVar, UniProt — whatever your data uses, join via 12.8M cross-references. No entity resolution needed.
Bridge to Other Graphs
OpenTargets, Monarch, Translator — our Xrefs use the same identifiers. GeneCards connects your data to the public ecosystem.
One Query, Full Context
One Cypher or SQL query spans genes, proteins, diseases, drugs, pathways, and more. No data silos, no data format nightmares.
Traverse Meaningful Relationships
Drug Targeting
Drug-gene interactions with binding affinity (pKi, pIC50) and mechanism of action data.
Disease Association
Gene-disease links with evidence types, scores, and multi-source attribution.
Protein Interactions
Protein-protein interactions with aggregated confidence scores from multiple sources.
GWAS Associations
Gene-trait connections with SNP scores, risk allele frequencies, and odds ratios.
Expression Profiling
Tissue-specific expression with GTEx TPM values and IHC protein levels.
Drug Repurposing
Connecting genes, pathways, diseases, and compounds for new indications.
Protein Modifications
Post-translational modification sites with amino acid positions, types, and subtypes.
Catalytic Reactions
Protein catalytic activities with biochemical equations, Rhea IDs, and EC numbers.
Cancer Associations
TCGA-derived gene-cancer links with statistical significance.
Not Just Links.
Strength. Publications. Evidence. Sources.
Most knowledge graphs give you binary associations.
Ours gives you the full story — binding affinities, confidence scores, PubMed IDs, evidence types, and source attribution on every edge.
Binary Link
Quantified Edge
What Lives on Every Edge
Binding Affinity
pKi, pIC50, pEC50, pKd — drug binding strength on -log10 scale. Available on 5,446 TARGETS edges.
Confidence Scores
avgConfidenceScore, avgExperimentalScore — PPI evidence strength on 0-1 scale across 2.8M edges.
Publications
pubmedIds as native arrays on every evidence-backed edge. Query with WHERE '15208697' IN r.pubmedIds.
Evidence Types
TextInference, MolecularBasis, GeneticAssociation — on 598K gene-disorder ASSOCIATED_WITH edges.
Source Attribution
Which databases support each association — DrugBank, ChEMBL, PharmGKB, etc.
Effect Sizes
oddsRatio, beta, riskAlleleFrequency — GWAS statistical evidence across 1.6M associations.
Expression Levels
GTEx RNA expression (TPM) per tissue, IHC microscopy levels, mass spectrometry levels.
Regulatory Scores
GeneHancer regulatory element confidence with elite status across 822K enhancer-gene edges.
Enzyme Kinetics
Km, kcat, Vmax with units and substrates across 7,590 Kinetics nodes.
PTM Positions
Position-specific modifications across 373K PTM nodes. Phosphorylation at Ser-19, Ubiquitination at Lys-48.
Five Formats. Zero Friction.
Import into your stack in minutes, not months.
Choose the format that fits your infrastructure.
Relational Database
Standard SQL tables for seamless integration with PostgreSQL, MySQL, or any RDBMS. Normalized schema with foreign keys preserving all relationship semantics.
- ✓ SQL-ready normalized tables
- ✓ Relationships preserved via foreign keys
- ✓ Works with any BI/analytics tool
JSON
Structured JSON export with nested objects preserving the full richness of every node and edge. Perfect for APIs, data pipelines, and programmatic access.
- ✓ Nested property objects intact
- ✓ API-ready with schema definition
- ✓ Stream or batch processing
Native Graph Database
Ready for neo4j-admin database import. Typed CSV headers, native array properties, and full-text search indexing — imports in ~2.5 minutes on commodity hardware.
- ✓ Auto-generated import script included
- ✓ Native array properties for multi-valued fields
- ✓ Full-text search indexable
Standards-Compliant TSV
Biolink Model-compliant format with CURIEs for all identifiers. Compatible with the Translator ecosystem, Monarch Initiative, and KGX tooling for graph ML pipelines.
- ✓ CURIEs for all identifiers
- ✓ Translator & Monarch integration compatible
- ✓ KGX-ready for graph ML and analytics
API Access
Integrate GeneCards data directly into your applications, tools, or platforms with our RESTful API. Programmatic access to the full knowledge graph.
- ✓ RESTful endpoints
- ✓ Integrate into existing pipelines
- ✓ Custom data feeds available
Built for Discovery
From target validation to AI grounding — one graph powers them all.
Target Validation
Traverse gene-disease-compound-pathway networks to validate drug targets with quantitative evidence.
Find all compounds targeting EGFR pathway genes with pKi > 7 — one query, 3 seconds.
Biomarker Discovery
Filter by GWAS signal, expression specificity, and gene constraint to identify candidate biomarkers.
Filter 13.7K GWAS traits by gene constraint score to find high-confidence biomarkers.
Drug Repurposing
Multi-hop queries across pathways and disease associations to find new indications for existing drugs.
Traverse BRCA1 → pathways → shared genes → compounds to find repurposing candidates.
AI / RAG Grounding
Pre-integrated gene summaries and structured data for LLM grounding and retrieval-augmented generation.
174K gene summaries + 59.7K disorder summaries, pre-formatted for LLM ingestion.
Variant Interpretation
Link variants to genes, disorders, proteins, and phenotypes with clinical significance and pathogenicity.
Link 2.8M variants to genes, disorders, and phenotypes with clinical significance.
Competitive Intelligence
Map the clinical trial landscape per target with compound mechanisms, approval status, and trial counts.
113K clinical trials mapped to gene targets with compound mechanisms and status.
Prior Art Research
Validate and protect innovations with comprehensive gene-compound-disorder associations backed by publication evidence.
Trace any gene-drug association to its original PubMed sources for patent filings.
Clinical Diagnosis
Decipher whole genomes and exomes to illuminate links between variants, genes, disorders, and phenotypes.
VarElect has been used in diagnosis of 100,000+ exome and whole genome cases.
Integrated from 200+ Sources
Authoritative biomedical databases, unified and updated with every GeneCards release.
Unified identifiers. Harmonized schemas. Pre-computed relationships.
Genes & Genomics
20+ sourcesDisorders & Clinical
15+ sourcesCompounds & Drugs
15+ sourcesProteins & Structure
15+ sourcesInteractions & Pathways
10+ sourcesExpression & Variants
15+ sources“Accenture had built its genomics platform through manual harvesting of public data sets. It was time consuming, partial in the data obtained, and very expensive time wise. GeneCards completely changed the capabilities of our platform. The data was far more complete, already linked and had multiple data sets we had not discovered. We now have a mature platform thanks to GeneCards. I would recommend investing in this asset to any users that are serious about managing research in the biomedical area.
Trusted by leading organizations
Ready to Explore the Graph?
Request a sample dataset, schedule a technical walkthrough, or speak with our data licensing team. Available as files-per-release or hosted instance.
Flexible licensing for academic and commercial use.