Convergence for bptt grads:
1.5078654e-06
-4.309501
-4.309342
-4.309467
-4.3093963
-4.3096657
-4.309655
-4.3093233
-4.309431
-4.3094716
-4.3093767
-4.3095856
-4.309495
-4.309357
-4.30965
-4.3096595
-4.3093715
-4.3095775
-4.309465
-4.309655
-4.309694
-4.309061
-4.3096943
-4.30973
-4.3094115
-4.3093624
-4.309664
-4.3092375
-4.309476
-4.309827
-4.3095593
-4.309277
-4.309699
-4.3094254
-4.30946
-4.309447
-4.3093667
-4.3093295
-4.309466
-4.3096604
-4.309426
Final cosine with grad:	 0.9999998807907104
Final dist with grad:	 0.0005332424771040678
