Thursday, March 1, 2007

Digital Library Of India

Few days back, I came across this Digital Library Of India. Though somewhere the site mentions that it was started in '05, Shame on my part that I didn't have a clue about this. Oh man, let me tell you, this site has a huge collection of books. Quite a few of them were published even before my grandparents were born!@#@!#@!#. Amidst this heap of books, I could find many many books on Vedas, Commentaries by contemporary scholars of 19th and 20th Century on Vedic topics, Kannada literature patronized by Jain Settlers and Vedic scholars, Arts, Novels, Story books, what Not. I could find many of the Sanskrit plays, Abhi-Jnaana-ShaakuntaLam (Kn - ಅಭಿಜ್ಞಾನ ಶಾಕುಂತಳಮ್, Sa - अभिज्ञान शाकुंतळम्) and PratiJna-YOungaNda-Raayanam(Kn - ಪ್ರತಿಜ್ಞಾ ಯೌಗಂಧರಾಯಣಮ್, Sa - प्रतिज्ञा यौगंधरायणम्) being just two of them...

This agglomeration, has many of the books that my grand parents, parents, teachers spoke of, but had always been very hard for me to find those books in the outlets. viz. Kabbigara Kaava(Kn - ಕಬ್ಬಿಗರ ಕಾವ), meaning Protector of Poets, Yashodhara Charitre (Kn-ಯಶೋಧರ ಚರಿತ್ರೆ), Adikavi Pampa's works etc. Usually it would so happen that few poems or quotes from these would be in our text books and book would have been suggested for reference. Thank god, at last I found a ray of light.

The Digital Library Of India has been hosted at four locations.
1. IIIT Hyderabad
2. IISc Bangalore
3. C-Dac Noida
4. Carnegie Mellon University , USA

Among these IIIT Hyderabad has the highest collection. Though the websites look shabby as of today, my sincere thanks to those who have been part of this project and made this huge repository of Indian Books available to everyone across the globe.

Wait now, is it possible to download the book ?

Answer is none of the above sites provide such facility. So, I had to write a simple Java program that does this. I am sharing this program with you all so that you would be able compile a pdf out of it. For compiling TIFF images onto pdf I have used iText, a Free Java-PDF Library. If you ever wanna thank someone, my request would be thank those who have actually worked behind this DLI project and iText team for the Java-Pdf library.

Note : I wrote this program in a hurry to compile books that were of interest to me. It may not satisfy all your needs, such as indexing etc.... Use this small program as a reference and feel free to modify accordingly ;)

1.
/*
* FileName : PdfGenerator.java
* Purpose : To download the Tiff images from Digital Library Of India and
* compile a pdf out of them.
*
* Note : This uses iText Library (iText, a Free Java-PDF Library)
*
* For more Technical and other information about iText please refer to
* http://www.lowagie.com/iText/
*/
package org.extracttiffs;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class PdfGenerator {
private static String tiffFileExtension = ".tif";

private static String pdfFileExtension = ".pdf";

/**
*
* @param args
* args[0] = Url of the Folder that contains the static Tiff
* images.
* If this is the image link,
* http://dli.iiit.ac.in//server25/data2/upload/TIL/TIL_OU_MAR_06_HDD_0017/Indi
* an_Languages_Books/200_Series_Kannada/200641_
* OU_Kabbigara_Kaavan/PTIFF/00000005.tif
*
* remove the page number and retain until http://....PTIFF/
*


* args[1] = Local Folder name where the pdf has to be generated
*


* args[2] = Local FileName to be created
*/
public static void main(String[] args) {

/* URL of the folder that contains static images */
if (args.length != 3) {
System.out
.println("Usage : java org.extracttiffs.PdfGenerator ");
return;
}
String strURL = args[0];

if (!(strURL.endsWith("/") || (strURL.endsWith("\\")))) {
strURL = strURL + "/";
}

/*
* Local folder name for the tiffs to be downloaded and pdf to be
* created out of it.
*/
String strLocalDirectory = args[1];
if (!(strLocalDirectory.endsWith("/") || (strLocalDirectory
.endsWith("\\")))) {
strLocalDirectory = strLocalDirectory + "/";
}

String pdfFileName = strLocalDirectory + args[2];
if (!pdfFileName.endsWith(".pdf")) {
pdfFileName = pdfFileName + pdfFileExtension;
}

/* In DLI, page number is an 8-digit number. */
int pageNumber = 100000001;

String strPageNumber = null;

int readByte = -1;
boolean downloadComplete = false;
String individualURL = null;
String tiffImageFileName = null;

Tiff2PdfWriter tiff2PdfWriter = new Tiff2PdfWriter();
tiff2PdfWriter.initialize(pdfFileName);

while (!downloadComplete) {

strPageNumber = String.valueOf(pageNumber);
strPageNumber = strPageNumber.substring(1);

individualURL = strURL + strPageNumber + tiffFileExtension;
URL url = null;
try {
url = new URL(individualURL);
} catch (MalformedURLException e) {
/* This is a dirty code to identify the Download completion. */
System.out.println("Download should be complete "
+ e.toString());
downloadComplete = true;
}

URLConnection urlconnection = null;
try {
urlconnection = url.openConnection();
InputStream fis = urlconnection.getInputStream();
tiffImageFileName = strLocalDirectory + strPageNumber
+ tiffFileExtension;
File tiffFile = new File(tiffImageFileName);
FileOutputStream fos = new FileOutputStream(tiffFile);

System.out.println("Download Started for " + individualURL);
while ((readByte = fis.read()) != -1) {
fos.write(readByte);
}
fis.close();
fos.close();
System.out.println("Single Page Download complete");
tiff2PdfWriter.addTiffImage2PdfFile(tiffImageFileName);

/* Delete the tiff file once written to the pdf file. */
tiffFile.delete();
} catch (IOException e) {
e.printStackTrace();
downloadComplete = true;
}
pageNumber++;
}
tiff2PdfWriter.finalizeBook();
}
}

2.
/*
* FileName : Tiff2PdfWriter.java
* Purpose : A independent Java class that constructs a pdf files using the
* tiff images.
* This java file uses iText(iText, a Free Java-PDF Library)
* Library and uses TiffImage library of iText.
*
* For more information about iText please refer to http://www.lowagie.com/iText/
*/
package org.extracttiffs;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Image;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.pdf.RandomAccessFileOrArray;
import com.lowagie.text.pdf.codec.TiffImage;

public class Tiff2PdfWriter {

Document pdfDocument = null;

PdfWriter pdfWriter = null;

PdfContentByte pdfContentByte = null;

public boolean initialize(String pdfFileName) {
boolean initComplete = true;

try {
pdfDocument = new Document();
pdfWriter = PdfWriter.getInstance(pdfDocument, new FileOutputStream(
pdfFileName));
} catch (FileNotFoundException e) {
System.out.println("Initialization failed " + e.toString());
initComplete = false;
} catch (DocumentException e) {
System.out.println("Initialization failed " + e.toString());
initComplete = false;
}
pdfDocument.open();
pdfContentByte = pdfWriter.getDirectContent();
return initComplete;
}

public void addTiffImage2PdfFile(String tiffImageFileName) {
RandomAccessFileOrArray randomAccessFile = null;
int numberOfPagesInTiffImage = 0;

try {
randomAccessFile = new RandomAccessFileOrArray(tiffImageFileName);
numberOfPagesInTiffImage = TiffImage
.getNumberOfPages(randomAccessFile);
} catch (Throwable e) {
System.out.println("Exception in " + tiffImageFileName + " "
+ e.getMessage());
}

for (int i = 0; i < numberOfPagesInTiffImage; ++i) {
try {
Image tiffImage = TiffImage.getTiffImage(randomAccessFile,
i + 1);
if (tiffImage != null) {
/* Adjust the width and height of the images */
if (tiffImage.scaledWidth() > 500
|| tiffImage.scaledHeight() > 700) {
tiffImage.scaleToFit(500, 700);
}
tiffImage.setAbsolutePosition(20, 20);
pdfContentByte.addImage(tiffImage);
pdfDocument.newPage();
}
} catch (Throwable e) {
System.out.println("Exception " + numberOfPagesInTiffImage
+ " page " + (i + 1) + " " + e.getMessage());
}
}
try {
randomAccessFile.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

public void finalizeBook() {
pdfDocument.close();
}
}



The iText library version I used to generate pdf is 2.0.0. Be sure that the CLASSPATH variable has this entry.

If you are using any Java editor, such as Netbeans or Eclipse the execution of this would be a piece of cake :). But I would also encourage you guyz to invoke this program from a command line ;) with correct CLASSPATH settings.

Did I bring to your notice about 2nd and 3rd standard Social science text book of 1949 for the schools in Dharwad Region of Karnataka. I bet you will enjoy reading that book. Browse through these sites to explore the books u need. Hope you enjoy reading the books from dli and most importantly lets hope that their website improves.

8 comments:

Unknown said...

Thank you very much Sunil for helping me to download books.
I thought its better if we publish the commands to download for other's purposes
C:\test>SET CLASSPATH=%CLASSPATH%;C:\TEST\pdfgen.jar;c:\Test\itext-2.0.0.jar
C:\test>echo %CLASSPATH%
C:\test>java org.extracttiffs.PdfGenerator http://dli.iiit.ac.in//server25/data3/upload/SVI_DEC_06/KANNADA/sachitra_saamaajika_abhyaasa_3_neya_iyattegaagi/PTIFF/ C:/test/ samaja.pdf

the above commands have been tested in Windows XP machine and works fine

ಸುನಿಲ್ ಜಯಪ್ರಕಾಶ್ said...

@All,
Rohith has a better GUI version of the Java solution. This GUI has text boxes where the user can enter the values and click Create Book Push Button. I could download one of the Saptagiri Sampada editions of 1980. More information could be obtained from his blog.

Unknown said...

Sunil,

For some of the books we are not able to see PTIFF in the url. it shows cgi in the url and the url does not change when we are navigating through the pages.
Can you please tell how to download this kind of Books.

Thanks
Subbu

ಸುನಿಲ್ ಜಯಪ್ರಕಾಶ್ said...

@subrahmanyam,

sorry, I could not reply for many days. If you haven't found other alternative, please try this. BTW, over the past few days I am trying few updates for the Java program. stay tuned!!!!

What is the URL that needs to be input to the Java Program

If you are using Interface 1
The URL displayed in the Address bar will be like
http://dli.iiit.ac.in/cgi-bin/Browse/scripts/use_scripts/advnew/aui/bookreader_india/bookReader_test.cgi?barcode=2030020028541

When the link opens, right click on the image you would see, and select Copy Image Location(in firefox). (In IE also there will be similar option). The Image location would be like.
http://dli.iiit.ac.in//server25/data2/upload/TIL/TIL_OU_MAR_06_HDD_0017/Indian_Languages_Books/200_Series_Kannada/200225_OU_Haalu_Muulu_19/PTIFF/00000001.tif

The input to the java program should be http://dli.iiit.ac.in//server25/data2/upload/TIL/TIL_OU_MAR_06_HDD_0017/Indian_Languages_Books/200_Series_Kannada/200225_OU_Haalu_Muulu_19/PTIFF/


If you are using Interface 2

The URL in the Address bar would be like
http://dli.iiit.ac.in/cgi-bin/Browse/scripts/use_scripts/advnew/aui/bookreader_india/1.cgi?barcode=2020050018014

Copy Image location on the image would give

http://dli.iiit.ac.in//cgi-bin/Browse/scripts/use_scripts/advnew/aui/TIFF2PNG/getPng.cgi?inputpath=http://dli.iiit.ac.in//server25/data/upload/PAR/IIIT/RMSC_PAR_DVD_171_TO_199/Rmsc-Par-Disk198-kanpur-english/15427%20AA_Modular_Distrubuted_Constraint_Logic_Programming_System/PTIFF/00000002.tif&thumbnail=1&format=0&width=600&height=800

From this remove until "........cgi?inputpath=" and retain only http://dli.iiit.ac.in//server25/data/upload/PAR/IIIT/RMSC_PAR_DVD_171_TO_199/Rmsc-Par-Disk198-kanpur-english/15427%20AA_Modular_Distrubuted_Constraint_Logic_Programming_System/PTIFF/

This has to be fed to the Java program.

ಬನವಾಸಿ ಬಳಗ said...

ಕನ್ನಡ-ಕನ್ನಡಿಗ-ಕರ್ನಾಟಕಗಳ ಏಳಿಗೆಗೆ ಬದ್ಧವಾದ ಬನವಾಸಿ ಬಳಗದ ಹೊಸ ಬ್ಲಾಗಿಗೊಮ್ಮೆ ಭೇಟಿಕೊಡಿ. ವಿಳಾಸ: http://enguru.blogspot.com

admin said...

hi sunil,

instead of saving the tiff first to local folder and then fetching it again and writing it to the pdf, cant we directly write the downloaded tiff to the pdf? that would sure reduce the time to generate book,

Sangana.

ksanthanamala said...

Pls guide me how to download the tamil books from digital library of india. Pls give me step by step

K.Mala
Malaysia

Roopa said...

Hi,
Landed here while googling ' how to download from digital library'

Can you please share a step by step method for the same to help a non computer geek like me and may be many others

Thnaks in advance

Roopa