GeneCards Knowledge Graph

200+ Sources. One Graph.

The biomedical data integration you'd spend years building — ready to query today.

35M+
Nodes
85M+
Relationships
290M+
Properties
200+
Data Sources
5M+
Researchers

What's in the Graph

Core Entities

Genes, proteins, compounds, disorders.

Gene 443K
Protein 261K
Compound 89.7K
Disorder 22.5K

Clinical & Genomic

Variants, trials, phenotypes, GWAS.

Gene Variant 2.8M
GWAS Trait 13.7K
Phenotype 30.9K
Clinical Trial 113K
Genomic Location 460K
Transcript 1.1M

Functional & Structural

Pathways, enzymes, domains, PTMs.

Pathway 4.4K
GO Term 19K
Enzyme 8.8K
Protein Domain 827K
Protein Structure 149K
PTM 373K

Expression & Evolution

Tissue expression, orthologs, enhancers.

RNA Expression 57 tissues
Protein Expression IHC + MS
Orthologs 11.4M
Enhancer 352K
Complex 1.5K
Kinetics 7.6K

Summaries & AI

AI-generated insights and publications.

Gene Summary (+ 21K AI) 153K
Disorder Summary (+ 17K AI) 42K
Disorder Description 8.8K
Publication 1M

Cross-References

50 namespaces, universal identifiers.

Cross-Reference 12.8M
Gene Alias 4.9M
Disorder Alias 228K
Symptom 67K
The Integration Hub

One Identifier.
The Entire Research Universe.

Every identifier in your data is a gateway.
Enter at any point — traverse the full biomedical landscape.

Patient Cohort
ICD10:E11.9
Disorder
Type 2 Diabetes Mellitus
682 Genes16,320 Compounds3,317 PathwaysClinical TrialsPhenotypesPublications
GWAS Study
rs80357906
Gene
BRCA1
64 Compounds208 Disorders83 Pathways12,359 VariantsExpressionOrthologs
Compound Library
CAS:58-08-2
Compound
Caffeine
76 Gene TargetsBinding AffinitiesDisordersPathwaysClinical TrialsPublications
Variant Pipeline
ClinVar:182956
Gene
TP53
3,499 Variants825 Disorders475 Compounds232 Pathways23 Cancer TypesPublications
12.8M
cross-references
50
namespaces
202
data sources

Connect Your Data

CAS numbers, DrugBank IDs, ICD-10 codes, rsNumbers, Ensembl IDs, HGNC symbols, ClinVar, UniProt — whatever your data uses, join via 12.8M cross-references. No entity resolution needed.

Bridge to Other Graphs

OpenTargets, Monarch, Translator — our Xrefs use the same identifiers. GeneCards connects your data to the public ecosystem.

One Query, Full Context

One Cypher or SQL query spans genes, proteins, diseases, drugs, pathways, and more. No data silos, no data format nightmares.

Traverse Meaningful Relationships

Drug Targeting

GeneCompound
143K relationships

Drug-gene interactions with binding affinity (pKi, pIC50) and mechanism of action data.

MATCH (g:Gene)-[t:TARGETS]->(c:Compound) WHERE t.bestPKi > 7

Disease Association

GeneDisorder
379K relationships

Gene-disease links with evidence types, scores, and multi-source attribution.

MATCH (g:Gene)-[a:ASSOCIATED_WITH]->(d:Disorder)

Protein Interactions

ProteinProtein
4.9M relationships

Protein-protein interactions with aggregated confidence scores from multiple sources.

MATCH (p1:Protein)-[i:INTERACTS_WITH]->(p2:Protein)

GWAS Associations

GeneGwasTrait
1.6M relationships

Gene-trait connections with SNP scores, risk allele frequencies, and odds ratios.

MATCH (g:Gene)-[r:GWAS_ASSOCIATED]->(t:GwasTrait)

Expression Profiling

GeneExpression
3.6M relationships

Tissue-specific expression with GTEx TPM values and IHC protein levels.

MATCH (g:Gene)-[r:RNA_EXPRESSED_IN]->(a:ExpressionAnatomy)

Drug Repurposing

GenePathwayCompound
multi-hop relationships

Connecting genes, pathways, diseases, and compounds for new indications.

MATCH (g:Gene)-[:IN_PATHWAY]->(p)<-[:IN_PATHWAY]-(g2)-[:TARGETS]->(c)

Protein Modifications

ProteinPTM
373K relationships

Post-translational modification sites with amino acid positions, types, and subtypes.

MATCH (p:Protein)-[:HAS_PTM]->(m:PTM)

Catalytic Reactions

ProteinEnzyme
8.8K relationships

Protein catalytic activities with biochemical equations, Rhea IDs, and EC numbers.

MATCH (p:Protein)-[:CATALYZES]->(e:Enzyme)

Cancer Associations

GeneCancerType
973 relationships

TCGA-derived gene-cancer links with statistical significance.

MATCH (g:Gene)-[:CANCER_ASSOCIATED]->(c:CancerType)
Every Edge Tells A Story

Not Just Links.
Strength. Publications. Evidence. Sources.

Most knowledge graphs give you binary associations.
Ours gives you the full story — binding affinities, confidence scores, PubMed IDs, evidence types, and source attribution on every edge.

Conventional Graph

Binary Link

BRCA1 Breast Cancer
BRCA1 Breast Cancer
GeneCards Knowledge Graph

Quantified Edge

pKi: 8.2 PMIDs sources: 5 score: 0.85 BRCA1 Breast Cancer
pKi: 8.2 pIC50: 7.4 mechanisms: [Inhibition] sources: [DrugBank, ChEMBL] score: 0.85

What Lives on Every Edge

Binding Affinity

pKi, pIC50, pEC50, pKd — drug binding strength on -log10 scale. Available on 5,446 TARGETS edges.

pKi: 8.2 (Gefitinib → EGFR)

Confidence Scores

avgConfidenceScore, avgExperimentalScore — PPI evidence strength on 0-1 scale across 2.8M edges.

score: 0.94 (TP53 — MDM2)

Publications

pubmedIds as native arrays on every evidence-backed edge. Query with WHERE '15208697' IN r.pubmedIds.

PMIDs: [15208697, 22168767]

Evidence Types

TextInference, MolecularBasis, GeneticAssociation — on 598K gene-disorder ASSOCIATED_WITH edges.

[MolecularBasis, GeneticAssociation]

Source Attribution

Which databases support each association — DrugBank, ChEMBL, PharmGKB, etc.

[DrugBank, ChEMBL, PharmGKB]

Effect Sizes

oddsRatio, beta, riskAlleleFrequency — GWAS statistical evidence across 1.6M associations.

OR: 1.84, RAF: 0.28

Expression Levels

GTEx RNA expression (TPM) per tissue, IHC microscopy levels, mass spectrometry levels.

TPM: 142.3 (liver)

Regulatory Scores

GeneHancer regulatory element confidence with elite status across 822K enhancer-gene edges.

score: 0.92, elite: true

Enzyme Kinetics

Km, kcat, Vmax with units and substrates across 7,590 Kinetics nodes.

Km: 0.2 mM, kcat: 45 s⁻¹

PTM Positions

Position-specific modifications across 373K PTM nodes. Phosphorylation at Ser-19, Ubiquitination at Lys-48.

Ser-19, phosphorylation
Delivery Formats

Five Formats. Zero Friction.

Import into your stack in minutes, not months.
Choose the format that fits your infrastructure.

RELATIONAL DATABASE

Relational Database

Standard SQL tables for seamless integration with PostgreSQL, MySQL, or any RDBMS. Normalized schema with foreign keys preserving all relationship semantics.

  • SQL-ready normalized tables
  • Relationships preserved via foreign keys
  • Works with any BI/analytics tool
~5 min import
{ }
JSON

JSON

Structured JSON export with nested objects preserving the full richness of every node and edge. Perfect for APIs, data pipelines, and programmatic access.

  • Nested property objects intact
  • API-ready with schema definition
  • Stream or batch processing
Instant — stream or load
NATIVE GRAPH

Native Graph Database

Ready for neo4j-admin database import. Typed CSV headers, native array properties, and full-text search indexing — imports in ~2.5 minutes on commodity hardware.

  • Auto-generated import script included
  • Native array properties for multi-valued fields
  • Full-text search indexable
~2.5 min import
STANDARDS-COMPLIANT TSV

Standards-Compliant TSV

Biolink Model-compliant format with CURIEs for all identifiers. Compatible with the Translator ecosystem, Monarch Initiative, and KGX tooling for graph ML pipelines.

  • CURIEs for all identifiers
  • Translator & Monarch integration compatible
  • KGX-ready for graph ML and analytics
Ready for KGX tooling
API ACCESS

API Access

Integrate GeneCards data directly into your applications, tools, or platforms with our RESTful API. Programmatic access to the full knowledge graph.

  • RESTful endpoints
  • Integrate into existing pipelines
  • Custom data feeds available
Real-time access
Use Cases

Built for Discovery

From target validation to AI grounding — one graph powers them all.

Target Validation

Target Validation

Traverse gene-disease-compound-pathway networks to validate drug targets with quantitative evidence.

Find all compounds targeting EGFR pathway genes with pKi > 7 — one query, 3 seconds.

Biomarker Discovery

Biomarker Discovery

Filter by GWAS signal, expression specificity, and gene constraint to identify candidate biomarkers.

Filter 13.7K GWAS traits by gene constraint score to find high-confidence biomarkers.

Drug Repurposing

Drug Repurposing

Multi-hop queries across pathways and disease associations to find new indications for existing drugs.

Traverse BRCA1 → pathways → shared genes → compounds to find repurposing candidates.

AI / RAG Grounding

AI / RAG Grounding

Pre-integrated gene summaries and structured data for LLM grounding and retrieval-augmented generation.

174K gene summaries + 59.7K disorder summaries, pre-formatted for LLM ingestion.

Variant Interpretation

Variant Interpretation

Link variants to genes, disorders, proteins, and phenotypes with clinical significance and pathogenicity.

Link 2.8M variants to genes, disorders, and phenotypes with clinical significance.

Competitive Intelligence

Competitive Intelligence

Map the clinical trial landscape per target with compound mechanisms, approval status, and trial counts.

113K clinical trials mapped to gene targets with compound mechanisms and status.

Prior Art Research

Prior Art Research

Validate and protect innovations with comprehensive gene-compound-disorder associations backed by publication evidence.

Trace any gene-drug association to its original PubMed sources for patent filings.

Clinical Diagnosis

Clinical Diagnosis

Decipher whole genomes and exomes to illuminate links between variants, genes, disorders, and phenotypes.

VarElect has been used in diagnosis of 100,000+ exome and whole genome cases.

Integrated from 200+ Sources

Authoritative biomedical databases, unified and updated with every GeneCards release.

Unified identifiers. Harmonized schemas. Pre-computed relationships.

Genes & Genomics

20+ sources
NCBI GeneEnsemblHGNCRefSeqUCSCGeneHancerENCODEFANTOMRNACentralmiRBaseLncBookPseudoGeneand more

Disorders & Clinical

15+ sources
OMIMMONDOMeSHUMLSMedGenICD-10ICD-11OrphanetSNOMED CTDisease OntologyClinicalTrials.govHPOODiseAand more

Compounds & Drugs

15+ sources
DrugBankChEMBLPubChemDGIdbIUPHAR/BPSCAS RegistryATCRxNormBitterDBMedChemExpressTocrisApexBioand more

Proteins & Structure

15+ sources
UniProtInterProPDBAlphaFoldNextProtPfamPANTHERCDDSMARTSUPERFAMILYProSiteGlyConnectand more

Interactions & Pathways

10+ sources
STRINGBioGRIDIntActReactomeWikiPathwaysKEGGGene OntologySignorMINTPathwayCommonsand more

Expression & Variants

15+ sources
GTExHuman Protein AtlasBGEEClinVardbSNPDGVCOSMICGWAS CatalogFABRICGenomeRNAiMGIHomologeneand more
+130 More Sources
What Our Customers Say

Accenture had built its genomics platform through manual harvesting of public data sets. It was time consuming, partial in the data obtained, and very expensive time wise. GeneCards completely changed the capabilities of our platform. The data was far more complete, already linked and had multiple data sets we had not discovered. We now have a mature platform thanks to GeneCards. I would recommend investing in this asset to any users that are serious about managing research in the biomedical area.

Cecil O. Lynch
Cecil O. Lynch, MD, MS
Global Biomedical Informatics Lead, Accenture

Trusted by leading organizations

Accenture
GSK
Sanofi
Genentech
CAS
Colossal
PhyloBio
EveryCure

Ready to Explore the Graph?

Request a sample dataset, schedule a technical walkthrough, or speak with our data licensing team. Available as files-per-release or hosted instance.

Flexible licensing for academic and commercial use.