Thursday, March 1, 2007

Digital Library Of India

Few days back, I came across this Digital Library Of India. Though somewhere the site mentions that it was started in '05, Shame on my part that I didn't have a clue about this. Oh man, let me tell you, this site has a huge collection of books. Quite a few of them were published even before my grandparents were born!@#@!#@!#. Amidst this heap of books, I could find many many books on Vedas, Commentaries by contemporary scholars of 19th and 20th Century on Vedic topics, Kannada literature patronized by Jain Settlers and Vedic scholars, Arts, Novels, Story books, what Not. I could find many of the Sanskrit plays, Abhi-Jnaana-ShaakuntaLam (Kn - ಅಭಿಜ್ಞಾನ ಶಾಕುಂತಳಮ್, Sa - अभिज्ञान शाकुंतळम्) and PratiJna-YOungaNda-Raayanam(Kn - ಪ್ರತಿಜ್ಞಾ ಯೌಗಂಧರಾಯಣಮ್, Sa - प्रतिज्ञा यौगंधरायणम्) being just two of them...

This agglomeration, has many of the books that my grand parents, parents, teachers spoke of, but had always been very hard for me to find those books in the outlets. viz. Kabbigara Kaava(Kn - ಕಬ್ಬಿಗರ ಕಾವ), meaning Protector of Poets, Yashodhara Charitre (Kn-ಯಶೋಧರ ಚರಿತ್ರೆ), Adikavi Pampa's works etc. Usually it would so happen that few poems or quotes from these would be in our text books and book would have been suggested for reference. Thank god, at last I found a ray of light.

The Digital Library Of India has been hosted at four locations.
1. IIIT Hyderabad
2. IISc Bangalore
3. C-Dac Noida
4. Carnegie Mellon University , USA

Among these IIIT Hyderabad has the highest collection. Though the websites look shabby as of today, my sincere thanks to those who have been part of this project and made this huge repository of Indian Books available to everyone across the globe.

Wait now, is it possible to download the book ?

Answer is none of the above sites provide such facility. So, I had to write a simple Java program that does this. I am sharing this program with you all so that you would be able compile a pdf out of it. For compiling TIFF images onto pdf I have used iText, a Free Java-PDF Library. If you ever wanna thank someone, my request would be thank those who have actually worked behind this DLI project and iText team for the Java-Pdf library.

Note : I wrote this program in a hurry to compile books that were of interest to me. It may not satisfy all your needs, such as indexing etc.... Use this small program as a reference and feel free to modify accordingly ;)

1.
/*
* FileName : PdfGenerator.java
* Purpose : To download the Tiff images from Digital Library Of India and
* compile a pdf out of them.
*
* Note : This uses iText Library (iText, a Free Java-PDF Library)
*
* For more Technical and other information about iText please refer to
* http://www.lowagie.com/iText/
*/
package org.extracttiffs;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class PdfGenerator {
private static String tiffFileExtension = ".tif";

private static String pdfFileExtension = ".pdf";

/**
*
* @param args
* args[0] = Url of the Folder that contains the static Tiff
* images.
* If this is the image link,
* http://dli.iiit.ac.in//server25/data2/upload/TIL/TIL_OU_MAR_06_HDD_0017/Indi
* an_Languages_Books/200_Series_Kannada/200641_
* OU_Kabbigara_Kaavan/PTIFF/00000005.tif
*
* remove the page number and retain until http://....PTIFF/
*


* args[1] = Local Folder name where the pdf has to be generated
*


* args[2] = Local FileName to be created
*/
public static void main(String[] args) {

/* URL of the folder that contains static images */
if (args.length != 3) {
System.out
.println("Usage : java org.extracttiffs.PdfGenerator ");
return;
}
String strURL = args[0];

if (!(strURL.endsWith("/") || (strURL.endsWith("\\")))) {
strURL = strURL + "/";
}

/*
* Local folder name for the tiffs to be downloaded and pdf to be
* created out of it.
*/
String strLocalDirectory = args[1];
if (!(strLocalDirectory.endsWith("/") || (strLocalDirectory
.endsWith("\\")))) {
strLocalDirectory = strLocalDirectory + "/";
}

String pdfFileName = strLocalDirectory + args[2];
if (!pdfFileName.endsWith(".pdf")) {
pdfFileName = pdfFileName + pdfFileExtension;
}

/* In DLI, page number is an 8-digit number. */
int pageNumber = 100000001;

String strPageNumber = null;

int readByte = -1;
boolean downloadComplete = false;
String individualURL = null;
String tiffImageFileName = null;

Tiff2PdfWriter tiff2PdfWriter = new Tiff2PdfWriter();
tiff2PdfWriter.initialize(pdfFileName);

while (!downloadComplete) {

strPageNumber = String.valueOf(pageNumber);
strPageNumber = strPageNumber.substring(1);

individualURL = strURL + strPageNumber + tiffFileExtension;
URL url = null;
try {
url = new URL(individualURL);
} catch (MalformedURLException e) {
/* This is a dirty code to identify the Download completion. */
System.out.println("Download should be complete "
+ e.toString());
downloadComplete = true;
}

URLConnection urlconnection = null;
try {
urlconnection = url.openConnection();
InputStream fis = urlconnection.getInputStream();
tiffImageFileName = strLocalDirectory + strPageNumber
+ tiffFileExtension;
File tiffFile = new File(tiffImageFileName);
FileOutputStream fos = new FileOutputStream(tiffFile);

System.out.println("Download Started for " + individualURL);
while ((readByte = fis.read()) != -1) {
fos.write(readByte);
}
fis.close();
fos.close();
System.out.println("Single Page Download complete");
tiff2PdfWriter.addTiffImage2PdfFile(tiffImageFileName);

/* Delete the tiff file once written to the pdf file. */
tiffFile.delete();
} catch (IOException e) {
e.printStackTrace();
downloadComplete = true;
}
pageNumber++;
}
tiff2PdfWriter.finalizeBook();
}
}

2.
/*
* FileName : Tiff2PdfWriter.java
* Purpose : A independent Java class that constructs a pdf files using the
* tiff images.
* This java file uses iText(iText, a Free Java-PDF Library)
* Library and uses TiffImage library of iText.
*
* For more information about iText please refer to http://www.lowagie.com/iText/
*/
package org.extracttiffs;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Image;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.pdf.RandomAccessFileOrArray;
import com.lowagie.text.pdf.codec.TiffImage;

public class Tiff2PdfWriter {

Document pdfDocument = null;

PdfWriter pdfWriter = null;

PdfContentByte pdfContentByte = null;

public boolean initialize(String pdfFileName) {
boolean initComplete = true;

try {
pdfDocument = new Document();
pdfWriter = PdfWriter.getInstance(pdfDocument, new FileOutputStream(
pdfFileName));
} catch (FileNotFoundException e) {
System.out.println("Initialization failed " + e.toString());
initComplete = false;
} catch (DocumentException e) {
System.out.println("Initialization failed " + e.toString());
initComplete = false;
}
pdfDocument.open();
pdfContentByte = pdfWriter.getDirectContent();
return initComplete;
}

public void addTiffImage2PdfFile(String tiffImageFileName) {
RandomAccessFileOrArray randomAccessFile = null;
int numberOfPagesInTiffImage = 0;

try {
randomAccessFile = new RandomAccessFileOrArray(tiffImageFileName);
numberOfPagesInTiffImage = TiffImage
.getNumberOfPages(randomAccessFile);
} catch (Throwable e) {
System.out.println("Exception in " + tiffImageFileName + " "
+ e.getMessage());
}

for (int i = 0; i < numberOfPagesInTiffImage; ++i) {
try {
Image tiffImage = TiffImage.getTiffImage(randomAccessFile,
i + 1);
if (tiffImage != null) {
/* Adjust the width and height of the images */
if (tiffImage.scaledWidth() > 500
|| tiffImage.scaledHeight() > 700) {
tiffImage.scaleToFit(500, 700);
}
tiffImage.setAbsolutePosition(20, 20);
pdfContentByte.addImage(tiffImage);
pdfDocument.newPage();
}
} catch (Throwable e) {
System.out.println("Exception " + numberOfPagesInTiffImage
+ " page " + (i + 1) + " " + e.getMessage());
}
}
try {
randomAccessFile.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

public void finalizeBook() {
pdfDocument.close();
}
}



The iText library version I used to generate pdf is 2.0.0. Be sure that the CLASSPATH variable has this entry.

If you are using any Java editor, such as Netbeans or Eclipse the execution of this would be a piece of cake :). But I would also encourage you guyz to invoke this program from a command line ;) with correct CLASSPATH settings.

Did I bring to your notice about 2nd and 3rd standard Social science text book of 1949 for the schools in Dharwad Region of Karnataka. I bet you will enjoy reading that book. Browse through these sites to explore the books u need. Hope you enjoy reading the books from dli and most importantly lets hope that their website improves.