I'm running on macOS. I'm just trying to get the tutorial analysis working. The files happily load in hadoop.
bin/seqspark conf/test.conf
conf file: conf/test.conf
spark options:
18/09/19 14:22:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/19 14:22:41 WARN seqspark.SingleStudy$: using an existing output directory '/Users/rjbohlender/software/seqspark/demo'
18/09/19 14:22:41 INFO ds.Phenotype$: creating phenotype dataframe from simulated.tsv
18/09/19 14:22:45 INFO worker.Import$: start import ...
18/09/19 14:22:45 INFO worker.Import$: using all variants
18/09/19 14:22:45 INFO worker.Import$: using filter: true
18/09/19 14:22:45 INFO worker.Variants$: decompose multi-allelic variants
18/09/19 14:22:45 INFO worker.Annotation$: annotation
18/09/19 14:22:45 INFO worker.Annotation$: link gene database ...
18/09/19 14:22:45 INFO annot.RefGene$: load RefSeq: coord: /Users/rjbohlender/seqspark-db/refFlat_table seq: /Users/rjbohlender/seqspark-db/refGene_seq
18/09/19 14:22:46 ERROR seqspark.SingleStudy$: Something went wrong, exit
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/rjbohlender/seqspark-db/refFlat_table
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1337)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1372)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.first(RDD.scala:1371)
at org.dizhang.seqspark.annot.RefGene$.apply(RefGene.scala:61)
at org.dizhang.seqspark.worker.Annotation$.linkGeneDB(Annotation.scala:109)
at org.dizhang.seqspark.worker.Annotation$.apply(Annotation.scala:56)
at org.dizhang.seqspark.worker.Pipeline$.run(Pipeline.scala:91)
at org.dizhang.seqspark.worker.Pipeline$.apply(Pipeline.scala:51)
at org.dizhang.seqspark.SingleStudy$.run(SingleStudy.scala:113)
at org.dizhang.seqspark.SingleStudy$.apply(SingleStudy.scala:51)
at org.dizhang.seqspark.SeqSpark$.main(SeqSpark.scala:68)
at org.dizhang.seqspark.SeqSpark.main(SeqSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
/usr/local/Cellar/hadoop/3.1.1 > hadoop fs -cat /Users/rjbohlender/seqspark-db/refGene_seq | head
2018-09-19 14:25:27,142 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>NM_001308203.1
tctcttgaatgaaggatgggaggggagaaagagagacggagagagagaga
gacgcacagatgtgcacggaggccacagacactgacatttggaattcctt
caggcggacggaatagacctcagcagcggcgtggtgaggacttagctggg
acctggaatcgtatcctcctgtgttttttcagactccttggaaattaagg
aatgcaattctgccaccatgatggaaggattgaaaaaacgtacaaggaag
gcctttggaatacggaagaaagaaaaggacactgattctacaggttcacc
agatagagatggaattaagaaaagcaatggggcaccaaatggattttatg
cggaaattgattgggaaagatataactcacctgagctggatgaagaaggc
tacagcatcagacccgaggaacccggctctaccaaaggaaagcactttta
/usr/local/Cellar/hadoop/3.1.1 > hadoop fs -cat /Users/rjbohlender/seqspark-db/refFlat_table | head
2018-09-19 14:26:31,345 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-09-19 14:26:32,372 INFO namenode.FSEditLog: Number of transactions: 34 Total time for transactions(ms): 5 Number of transactions batched in Syncs: 97 Number of syncs: 24 SyncTimes(ms): 10
geneName name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds
OR4F5 NM_001005484 chr1 + 69090 70008 69090 70008 1 69090, 70008,
OR4F16 NM_001005277 chr1 + 367658 368597 367658 368597 1 367658, 368597,
OR4F3 NM_001005224 chr1 + 367658 368597 367658 368597 1 367658, 368597,
OR4F29 NM_001005221 chr1 + 367658 368597 367658 368597 1 367658, 368597,
OR4F16 NM_001005277 chr1 - 621095 622034 621095 622034 1 621095, 622034,
OR4F3 NM_001005224 chr1 - 621095 622034 621095 622034 1 621095, 622034,
OR4F29 NM_001005221 chr1 - 621095 622034 621095 622034 1 621095, 622034,
SAMD11 NM_152486 chr1 + 861120 879961 861321 879533 14 861120,861301,865534,866418,871151,874419,874654,876523,877515,877789,877938,878632,879077,879287, 861180,861393,865716,866469,871276,874509,874840,876686,877631,877868,878438,878757,879188,879961,
NOC2L NM_015658 chr1 - 879582 894679 880073 894620 19 879582,880436,880897,881552,881781,883510,883869,886506,887379,887791,888554,889161,889383,891302,891474,892273,892478,894308,894594, 880180,880526,881033,881666,881925,883612,883983,886618,887519,887980,888668,889272,889462,891393,891595,892405,892653,894461,894679,
I'm running on macOS. I'm just trying to get the tutorial analysis working. The files happily load in hadoop.
Checking to make sure the files are there and accessible at the given path: