GeneCards Knowledge Graph

>190 Sources. One Graph.

The biomedical data integration you'd spend years building — ready to query today.

Request Access

90M+

Nodes

190M+

Relationships

700M+

Properties

>190

Data Sources

5M+

Researchers

What's in the Graph

Core Entities

Genes, proteins, compounds, disorders.

Gene 471K

Protein 271K

Compound 103K

Disorder 22.8K

Clinical & Genomic

Variants, trials, phenotypes, GWAS.

Gene Variant 2.8M

GWAS Trait 13.7K

Phenotype 30.7K

Clinical Trial 138K

Genomic Location 591K

Transcript 2.0M

Functional & Structural

Pathways, enzymes, domains, PTMs.

Pathway 4.5K

GO Term 19K

Enzyme 8.9K

Protein Domain 830K

Protein Structure 150K

PTM 375K

Expression & Evolution

Tissue expression, orthologs, enhancers.

RNA Expression 57 tissues

Protein Expression IHC + MS

Orthologs 12.5M

Enhancer 352K

Complex 1.6K

Kinetics 7.9K

Summaries & AI

AI-generated insights and publications.

Gene Summary (+ 21K AI) 174K

Disorder Summary (+ 17K AI) 60K

Disorder Description 8.9K

Publication 2.9M

Cross-References

50 namespaces, universal identifiers.

Cross-Reference 13.8M

Gene Alias 5.1M

Disorder Alias 229K

Symptom 68K

The Integration Hub

One Identifier.
The Entire Research Universe.

Every identifier in your data is a gateway.
Enter at any point — traverse the full biomedical landscape.

Patient Cohort

ICD10:E11.9

Disorder

Type 2 Diabetes Mellitus

682 Genes16,320 Compounds3,317 PathwaysClinical TrialsPhenotypesPublications

GWAS Study

rs80357906

Gene

BRCA1

64 Compounds208 Disorders83 Pathways12,359 VariantsExpressionOrthologs

Compound Library

CAS:58-08-2

Compound

Caffeine

76 Gene TargetsBinding AffinitiesDisordersPathwaysClinical TrialsPublications

Variant Pipeline

ClinVar:182956

Gene

TP53

3,499 Variants825 Disorders475 Compounds232 Pathways23 Cancer TypesPublications

13.8M

cross-references

namespaces

>190

data sources

Connect Your Data

CAS numbers, DrugBank IDs, ICD-10 codes, rsNumbers, Ensembl IDs, HGNC symbols, ClinVar, UniProt — whatever your data uses, join via 13.8M cross-references. No entity resolution needed.

Bridge to Other Graphs

OpenTargets, Monarch, Translator — our Xrefs use the same identifiers. GeneCards connects your data to the public ecosystem.

One Query, Full Context

One Cypher or SQL query spans genes, proteins, diseases, drugs, pathways, and more. No data silos, no data format nightmares.

Traverse Meaningful Relationships

Every reified association is attested — chain (:Evidence)-[:SUPPORTS]->(Association)-[:FROM_SOURCE]->(Source) and -[:CITES]->(Publication) to resolve any claim to ChEMBL, ClinVar, DrugBank, and more with PubMed citations.

Drug Targeting

GeneCompound

143K relationships

Drug-gene interactions with binding affinity (pKi, pIC50) and mechanism of action data.

MATCH (g:Gene)<-[:HAS_SUBJECT]-(a:GeneCompoundAssociation)-[:HAS_OBJECT]->(c:Compound) WHERE a.bestPKi > 7

Disease Association

GeneDisorder

393K relationships

Gene-disease links with evidence types, scores, and multi-source attribution.

MATCH (g:Gene)<-[:HAS_SUBJECT]-(a:GeneDisorderAssociation)-[:HAS_OBJECT]->(d:Disorder)

Molecular Interactions

GeneProtein

5.0M relationships

Gene and protein interactions with aggregated confidence scores from multiple sources. Polymorphic endpoints span protein–protein and gene-level interactions.

MATCH (p1:Protein)<-[:HAS_SUBJECT]-(:GeneInteractionAssociation {endpointKinds:'protein-protein'})-[:HAS_OBJECT]->(p2:Protein)

GWAS Associations

GeneGwasTrait

1.75M relationships

Gene-trait connections with SNP scores, risk allele frequencies, and odds ratios.

MATCH (g:Gene)<-[:HAS_SUBJECT]-(:GeneGwasAssociation)-[:HAS_OBJECT]->(t:GwasTrait)

Expression Profiling

GeneExpression

5.6M relationships

Tissue-specific expression with GTEx TPM values and IHC protein levels.

MATCH (g:Gene)-[r:RNA_EXPRESSED_IN]->(a:Anatomy)

Drug Repurposing

GenePathwayCompound

multi-hop relationships

Connecting genes, pathways, diseases, and compounds for new indications.

MATCH (g:Gene)-[:IN_PATHWAY]->(:Pathway)<-[:IN_PATHWAY]-(:Gene)<-[:HAS_SUBJECT]-(:GeneCompoundAssociation)-[:HAS_OBJECT]->(c:Compound)

Protein Modifications

ProteinPTM

375K relationships

Post-translational modification sites with amino acid positions, types, and subtypes.

MATCH (p:Protein)-[:HAS_PTM]->(m:PTM)

Catalytic Reactions

ProteinEnzyme

15K relationships

Protein catalytic activities with biochemical equations, Rhea IDs, and EC numbers.

MATCH (p:Protein)<-[:HAS_SUBJECT]-(:ProteinEnzymeAssociation)-[:HAS_OBJECT]->(e:Enzyme)

Cancer Associations

GeneCancerType

341K relationships

TCGA/FABRIC-derived gene-cancer links with mutation enrichment and statistical significance.

MATCH (g:Gene)-[:CANCER_ASSOCIATED]->(c:CancerType)

Every Edge Tells A Story

Not Just Links.
Strength. Publications. Evidence. Sources.

Most knowledge graphs give you binary associations.
Ours gives you the full story — binding affinities, confidence scores, PubMed citations, evidence types, and source attribution, captured as first-class, queryable provenance you can trace back to the source.

Conventional Graph

Binary Link

BRCA1 → Breast Cancer

GeneCards Knowledge Graph

Quantified Edge

pKi: 8.2 pIC50: 7.4 mechanisms: [Inhibition] sources: [DrugBank, ChEMBL] score: 0.85

What Lives on Every Relationship

Binding Affinity

pKi, pIC50, pEC50, pKd — drug binding strength on -log10 scale. Populated on 14.5K GeneCompoundAssociations with measured affinity.

pKi: 8.2 (Gefitinib → EGFR)

Confidence Scores

avgConfidenceScore, avgExperimentalScore — interaction evidence strength on a 0-1 scale across 5.0M gene/protein interaction associations.

score: 0.94 (TP53 — MDM2)

Publications

Every claim links to its literature through first-class CITES edges — 14M+ citation links you can traverse straight to the source PubMed records.

(Evidence)-[:CITES]->(PMID:15208697)

Evidence Types

TextInference, MolecularBasis, GeneticAssociation — across 574K gene-disorder evidence attestations.

[MolecularBasis, GeneticAssociation]

Source Attribution

Which databases support each association — DrugBank, ChEMBL, PharmGKB, etc.

[DrugBank, ChEMBL, PharmGKB]

Effect Sizes

oddsRatio, beta, riskAlleleFrequency — GWAS statistical evidence across 1.75M associations.

OR: 1.84, RAF: 0.28

Expression Levels

GTEx RNA expression (TPM) per tissue, IHC microscopy levels, mass spectrometry levels.

TPM: 142.3 (liver)

Regulatory Scores

GeneHancer regulatory element confidence with elite status across 2.2M enhancer-gene edges.

score: 0.92, elite: true

Enzyme Kinetics

Km, kcat, Vmax with units and substrates across 7,938 Kinetics nodes.

Km: 0.2 mM, kcat: 45 s⁻¹

PTM Positions

Position-specific modifications across 375K PTM nodes. Phosphorylation at Ser-19, Ubiquitination at Lys-48.

Ser-19, phosphorylation

Delivery Formats

Five Formats. Zero Friction.

Import into your stack in minutes, not months.
Choose the format that fits your infrastructure.

RELATIONAL DATABASE

Relational Database

Standard SQL tables for seamless integration with PostgreSQL, MySQL, or any RDBMS. Normalized schema with foreign keys preserving all relationship semantics.

✓ SQL-ready normalized tables
✓ Relationships preserved via foreign keys
✓ Works with any BI/analytics tool

~5 min import

JSON

Structured JSON export with nested objects preserving the full richness of every node and edge. Perfect for APIs, data pipelines, and programmatic access.

✓ Nested property objects intact
✓ API-ready with schema definition
✓ Stream or batch processing

Instant — stream or load

NATIVE GRAPH

Native Graph Database

Ready for neo4j-admin database import. Typed CSV headers, native array properties, and full-text search indexing — imports in ~2.5 minutes on commodity hardware.

✓ Auto-generated import script included
✓ Native array properties for multi-valued fields
✓ Full-text search indexable

~2.5 min import

STANDARDS-COMPLIANT TSV

Standards-Compliant TSV

Biolink Model-compliant format with CURIEs for all identifiers. Compatible with the Translator ecosystem, Monarch Initiative, and KGX tooling for graph ML pipelines.

✓ CURIEs for all identifiers
✓ Translator & Monarch integration compatible
✓ KGX-ready for graph ML and analytics

Ready for KGX tooling

API ACCESS

API Access

Integrate GeneCards data directly into your applications, tools, or platforms with our RESTful API. Programmatic access to the full knowledge graph.

✓ RESTful endpoints
✓ Integrate into existing pipelines
✓ Custom data feeds available

Real-time access

Use Cases

Built for Discovery

From target validation to AI grounding — one graph powers them all.

Target Validation

Traverse gene-disease-compound-pathway networks to validate drug targets with quantitative evidence.

Find all compounds targeting EGFR pathway genes with pKi > 7 — one query, 3 seconds.

Biomarker Discovery

Filter by GWAS signal, expression specificity, and gene constraint to identify candidate biomarkers.

Filter 13.7K GWAS traits by gene constraint score to find high-confidence biomarkers.

Drug Repurposing

Multi-hop queries across pathways and disease associations to find new indications for existing drugs.

Traverse BRCA1 → pathways → shared genes → compounds to find repurposing candidates.

AI / RAG Grounding

Pre-integrated gene summaries and structured data for LLM grounding and retrieval-augmented generation.

174K gene summaries + 60K disorder summaries, pre-formatted for LLM ingestion.

Variant Interpretation

Link variants to genes, disorders, proteins, and phenotypes with clinical significance and pathogenicity.

Link 2.8M variants to genes, disorders, and phenotypes with clinical significance.

Competitive Intelligence

Map the clinical trial landscape per target with compound mechanisms, approval status, and trial counts.

138K clinical trials mapped to gene targets with compound mechanisms and status.

Prior Art Research

Validate and protect innovations with comprehensive gene-compound-disorder associations backed by publication evidence.

Trace any gene-drug association to its original PubMed sources for patent filings.

Clinical Diagnosis

Decipher whole genomes and exomes to illuminate links between variants, genes, disorders, and phenotypes.

VarElect has been used in diagnosis of 100,000+ exome and whole genome cases.

Integrated from >190 Sources

Authoritative biomedical databases, unified and updated with every GeneCards release.

Unified identifiers. Harmonized schemas. Pre-computed relationships.

Genes & Genomics

20+ sources

NCBI GeneEnsemblHGNCRefSeqUCSCGeneHancerENCODEFANTOMRNACentralmiRBaseLncBookPseudoGeneand more

Disorders & Clinical

15+ sources

OMIMMONDOMeSHUMLSMedGenICD-10ICD-11OrphanetSNOMED CTDisease OntologyClinicalTrials.govHPOODiseAand more

Compounds & Drugs

15+ sources

DrugBankChEMBLPubChemDGIdbIUPHAR/BPSCAS RegistryATCRxNormBitterDBMedChemExpressTocrisApexBioand more

Proteins & Structure

15+ sources

UniProtInterProPDBAlphaFoldNextProtPfamPANTHERCDDSMARTSUPERFAMILYProSiteGlyConnectand more

Interactions & Pathways

10+ sources

STRINGBioGRIDIntActReactomeWikiPathwaysKEGGGene OntologySignorMINTPathwayCommonsand more

Expression & Variants

15+ sources

GTExHuman Protein AtlasBGEEClinVardbSNPDGVCOSMICGWAS CatalogFABRICGenomeRNAiMGIHomologeneand more

+122 More Sources

What Our Customers Say

“

Accenture had built its genomics platform through manual harvesting of public data sets. It was time consuming, partial in the data obtained, and very expensive time wise. GeneCards completely changed the capabilities of our platform. The data was far more complete, already linked and had multiple data sets we had not discovered. We now have a mature platform thanks to GeneCards. I would recommend investing in this asset to any users that are serious about managing research in the biomedical area.

Cecil O. Lynch, MD, MS

Global Biomedical Informatics Lead, Accenture

Trusted by leading organizations

Ready to Explore the Graph?

Request a sample dataset, schedule a technical walkthrough, or speak with our data licensing team. Available as files-per-release or hosted instance.

Flexible licensing for academic and commercial use.

GeneCards Knowledge Graph

>190 Sources. One Graph.

What's in the Graph

Core Entities

Clinical & Genomic

Functional & Structural

Expression & Evolution

Summaries & AI

Cross-References

One Identifier.The Entire Research Universe.

Connect Your Data

Bridge to Other Graphs

One Query, Full Context

Traverse Meaningful Relationships

Drug Targeting

Disease Association

Molecular Interactions

GWAS Associations

Expression Profiling

Drug Repurposing

Protein Modifications

Catalytic Reactions

Cancer Associations

Not Just Links.Strength. Publications. Evidence. Sources.

Binary Link

Quantified Edge

What Lives on Every Relationship

Binding Affinity

Confidence Scores

Publications

Evidence Types

Source Attribution

Effect Sizes

Expression Levels

Regulatory Scores

Enzyme Kinetics

PTM Positions

Five Formats. Zero Friction.

Relational Database

JSON

Native Graph Database

Standards-Compliant TSV

API Access

Built for Discovery

Target Validation

Biomarker Discovery

Drug Repurposing

AI / RAG Grounding

Variant Interpretation

Competitive Intelligence

Prior Art Research

Clinical Diagnosis

Integrated from >190 Sources

Genes & Genomics

Disorders & Clinical

Compounds & Drugs

Proteins & Structure

Interactions & Pathways

Expression & Variants

Ready to Explore the Graph?

One Identifier.
The Entire Research Universe.

Not Just Links.
Strength. Publications. Evidence. Sources.