Dataset Collection:

About Dataset: French/English parallel texts for training translation models. Over 22.5 million sentences in French and English.Dataset created by Chris Callison-Burch, who crawled millions of web pages and then used a set of simple heuristics to transform French URLs onto English URLs, and assumed that these documents are translations of each other. This is the main dataset of Workshop on Statistical Machine Translation (WML) 2015 Dataset that can be used for Machine Translation and Language Models.

Refer to the paper here:PDF