1000 Genomes Project

Dataset 1000 Genomes
The 1000 Genomes Project is an international effort to create a detailed catalog human genetic variation. The plan for the full project is to sequence about 2,500 samples from populations around the world at 4X or better coverage. Although this plan continues to change as better sequencing technologies come online.
Keywords: biologygenomics
Size: 396.7TB

Identifiers:

  • ark:/31807/osdc-4a3ec448
Last Updated: 2013-06-04 15:30:00 UTC

Access Instructions

All public data sets are available on both commodity internet connections and high speed StarLight/Internet2 connections. We recommend using rsync or UDR to download the data.

Downloading with UDR (UDT enabled rsync)

UDR is a wrapper around rsync that enables rsync to use the high performance UDT network protocol, which can greatly improve download speeds, especially over high speed networks. Once installed, the only change is placing the udr command before the same rsync command you typically use to download the data. UDR is open source and under active development, the most recent version is available on githubAt the moment, UDR is not enabled on the transfer node. The functionality should return shortly. Use rsync in the meantime.

List the contents of 1000 Genomes Project:

  • Using rsync: rsync publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/
  • Using udr: udr rsync publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/

Download/synchronize 1000 Genomes Project:

  • Using rsync: rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/ /path/to/local_copy
  • Using udr: udr rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/ /path/to/local_copy

Download an individual file from 1000 Genomes Project:

  • Using rsync: rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/remotefile /path/to/local_copy
  • Using udr: udr rsync -avzuP publicdata.opensciencedatacloud.org::ark:/31807/osdc-4a3ec448/remotefile /path/to/local_copy

Link to Dataset – 1000 Genomes: 

Profile Status
ACTIVE
Profile Info
 

Kalyan Banga205 Posts

I am Kalyan Banga, a Post Graduate in Business Analytics from Indian Institute of Management (IIM) Calcutta, a premier management institute, ranked best B-School in Asia in FT Masters management global rankings. I have spent 6 years in field of Analytics.

0 Comments

Leave a Comment

2 × 4 =