Web Scraping – what and why

I really hate transcribing stuff from one programme to another. Practically, my poor typing skills and a dodgy keyboard don’t help. Philosophically, I don’t see why I should have to type stuff from one window to another. What on earth are computers for?

With most genealogy programmes, GEDCOM can be used to transfer the basics from one programme to another. Multimedia data causes most problems, but careful batch editing and some basic scripting can solve many issues.

The biggest problem is extracting data from the web on the many online resources now available, especially the subscription sites. The conventional way is to look at the web page on the browser and re-type the data into the family history database, in my case introducing all kinds of errors on the way. What is needed is some way of getting the data off the browser and into a computer database without it being touched (or typed) by human hand. This can be done – it’s called web scraping.

There are many different ways of scraping data from the web, and also putting it back on to other web sites. I hope, in a series of posts, to share some of the ways this can be done.


A Photo Album

I recently “repatriated” a photo album to its owners who had not seen it for 36 years – for the full story, see my web page features http://andymick.magix.net/public/features.htm

I’m sorry I’ve not posted much recently. My health is not as good and I don’t have the energy either for as much research or for posting. Also the Micklethwaite Family History group on Facebook has been taking up quite a bit of time and energy.


Missing Micklethwaite Children

In his recent newsletter, Peter Calver at LostCousins described the “fertility” questions on the 1911 census – a married woman is asked to record how many children she had and how many are still alive. So a genealogist can look at this number and can compare it with the number he or she has recorded. My efforts are described below.

Incidentally, if you are interested in family history and haven’t registered with LostCousins, please do consider it. It’s well worthwhile as, although you don’t get contacts without quite a lot of hard work in entering your ancestors and their cousins, the contacts you do get are of good quality. Also, Peter’s newsletter is an excellent and informative read. You don’t have to pay money and subscribe, although that does help the business, you can just register. If you do decide to join, please let me refer you – I get “brownie points”!

So how did I do for missing children – I found 51 missing Micklethwaites in my One Name Study. Most had just 1 or sometimes 2 missing, but one family (Tom and Ann nee Newsam) had 13 children of which only 3 were alive in 1911. I only have 5 of their children in my database, so 8 are missing. Interestingly, this information was supplied in error by Ann (as Tom had died) as the census asked for married women to enter the information, not widows, and the enumerator usually crossed out widows entries, including this one. Tom’s family are on several public trees on Ancestry, but none of them have more than the 5 children I have.

The difficulty is relating the children who are missing with the children I have who are unattached. I have registration data, for births and deaths, but the index does not specify parental details, which can’t be found without buying the certificates (the birth index improved in 1912 by including the mother’s maiden name). Sometimes, baptisms can be found. Some burial details can also occasionally be found. Just occasionally, just one birth happens in a place where one child is missing. But the majority look set to remain missing.

Micklethwaite Ellis Connections

Nigel and I made contact very recently through Genes Reunited (a rare success!) He has Ellis connections. So I started digging in my database to see what I could find.

I knew I had quite a few Ellis entries in the database – in fact it’s equal 8th in the list of most common spouse names:
Smith 37
Wood 24
Jackson 21
Jones 20
Wilson 20
Taylor 19
Robinson 18
Ellis 17
Shaw 17
Walker 17

I looked into the use of Micklethwaite as a given name last year as use of surnames as given names was not uncommon in the mid to late 1800s (see also http://andymick.magix.net/public/name.htm) – I found a Micklethwaite Ellis to go with the Ellis Micklethwaite (see below for both)!

So who are these folks? (Numbers in brackets are the identifiers in my database)

Ellis Micklethwaite (m3339), born/died 1878 to Henry and Harriet nee Ellis (see below). Ellis and Henry are descended from Joseph of Thornhill (m1038)

Ellis Jade Micklethwaite (k1951), born 1995 Leeds to Gary and Yvette nee Rainford. Ellis and Gary are descended from Jonas of Mirfield (J1)

Nadia Ellis Micklethwaite (k1887), born 1991 Barnsley, to Donna Micklethwaite and Ian Shirt. Nadia and Donna are descended from John of Penistone (m2600)

Albert Ellis married Annie Micklethwaite (m7549) in Dewsbury in 1927. Annie is descended from Elias of Mirfield (E1).

Ben Ellis married Elizabeth Micklethwaite (m1172) in Scisset in 1868. Elizabeth is descended from Maria of Penistone (m5646)

Benjamin Ellis married Rachel Micklethwaite (m2108) in Mirfield in 1797. Rachel is descended from Elias of Mirfield (E1)

Bertha Ellis married George Edward Micklethwaite (k3870) in Middlesbrough in 1957. George is descended from George of Emley (m3131)

Clarence F Ellis married Harriet Micklethwaite (m2706) in Barnsley in 1926. Harriet is the daughter of Frederick (m1425) and Fanny nee Ellis who married in Barnsley in 1889. A contact has found that Clarence is Fanny’s nephew, so this was a first cousin marriage. Frederick is descended from Joseph of Thornhill (m1038).

Edwin Ellis married Phoebe Micklethwaite (m1384) in Skelmanthorpe in 1898. Phoebe is descended from Maria of Penistone (m5646)

Ellen Ellis married Richard Micklethwaite (m7977) in Rotherham 1839. So far I haven’t identified Richard, but he could be the son of Jonathan and Ruth of Hatfield, descended from Richard of Cawthorne (JM1025), but I haven’t found census entries for 1841 or 1851, and by 1861 he’s married to Mary Woodley.

Fanny Ellis – see Clarence above.

Hannah Ellis married William Micklethwaite (m92) in Thornhill in 1845. William is descended from Elias of Mirfield (E1)

Harriet Ellis married Henry Micklethwaite (m65) in Thornhill in 1874. Henry is descended from Joseph of Thornhill (m1038)

Jabez Ellis married Julia Micklethwaite (m2115) in Mirfield in 1812. Julia is descended from Elias of Mirfield (E1). She later married George Lee in Mirfield in 1841.

John Ellis married Ann Micklethwaite (JMF11) in Felkirk in 1780. Ann is daughter of Richard of Felkirk, who is currently not connected to any of the branches.

John E Ellis married Alice A Micklethwaite (m7544) in Barnsley in 1934. Alice is descended from John of Penistone (m2600)

Louisa Elliss married Benjamin Micklethwaite (m5146) of Sheffield in London in 1813. Benjamin is descended from Josias of Silkstone m5590. See also Mary Ann. Was she related to Mary Ann?

Mary Ellis married Joseph Micklethwaite (m9588) in Penistone in 1785. Joseph is currently unidentified.

Mary Ellis married William Micklethwaite (m2487) in Ackworth in 1806. William is descended from Jonathan of Campsall (m5934)

Mary Ann Ellis married Benjamin Micklethwaite (m5146) in Sheffield in 1833, his second wife. Benjamin is descended from Josias of Silkstone m5590. Was she related to his first wife Louisa – see above? In 1851 and 1861 she was widowed housekeeper to her brothers John, Edwin, Frederick and Joseph.

Micklethwaite Ellis (m9452) believed to be the son of Henry and Mercy (or Mary) of Hartshead. The Micklethwaite connection is not yet known.

It’s interesting to see that the Micklethwaite – Ellis connections span many years, many places and several of the main Micklethwaite branches.

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

A San Francisco cable car holds 60 people. This blog was viewed about 610 times in 2014. If it were a cable car, it would take about 10 trips to carry that many people.

Click here to see the complete report.

Errors in Marriage Registration

I’ve been reading Michael W Foster’s book “A Comedy of Errors” (ISBN 0473055813). It’s not a particularly riveting read, but it is a fascinating and detailed description of the marriage registration process and of the errors that can happen and have happened therein.

The process starts with the wedding. We always assume that the vicar or minister has correctly written the names of the happy couple and the names of their fathers and the names of the witnesses. But sometimes he didn’t (did you check at your wedding?) and wrote the wrong name or the right name in the wrong place. Sometimes, of course, the bride or groom may not have given him correct information (e.g. age or father’s name), whether meaning to deceive or just not knowing the right details. So even the entry on the original church (or similar) register may have errors.

Then the vicar or minister, or his clerk, copies that information to the local register office. A copy always gives opportunities for errors and Foster finds the evidence for them. The local office copies it to the General Register Office. Details are temporarily copied on to a card so the entries can be sorted for the index, then copied, eventually, on to the index. More copying errors ensue, and also duplication where the clerk in the GRO can’t read someone’s writing. Then the indices had type-written copies made where more errors could occur. Sometimes too, entries were missed during the copying process, even whole pages of entries.

Foster gives lots of examples of these errors and extrapolates to how many errors there might be, but it’s difficult to extract a round number for the likely error rate.

The moral of the tale: if something doesn’t seem right with a marriage registration, it’s very  possible that it isn’t right!

Hello Rebecca

I’ve a new friend – her name is Rebecca. Actually she’s more formally known as Linux Mint Cinnamon release 17.1 And I’ve dumped that awful Win8.1 for something a lot faster and less demanding (particularly of “real-estate” on the screen). I’ve not sworn at the computer once since Rebecca took over, something SWMBO is much happier with.

Since getting Linux going 2 weeks ago, I’ve only had to return to Windoze once, and that’s because I needed to print something out more quickly than sorting out printers would take (which I’ve now done – you need to put your Linux user in the group “vboxusers” for USB devices to be picked up by VirtualBox). The real beauty was than the virtual machine file  which I use to run WindowsXP just moved across to Linux and I could use it just as soon as I’d worked out how to get VirtualBox to recognise existing files.

I’m sure there will be problems ahead and I’ve still got some software to install, but at the moment I’m delighted!


Facebook Group

I created a Facebook group yesterday, called Micklethwaite Family History, so that those of us researching the various branches could work together to solve the issues. The link is here.

However, most people are reporting that they cannot find it. I do not understand why that is. I will continue investigating!

Edit 18th Dec ’14:

Hopefully it can now be found. I hadn’t added my location to my profile – it seems Facebook selects groups for you based on your location.


Mysteries Solved

Some mysteries are difficult to solve, others are much easier, but they can be painful for a Yorkshireman as they involve spending money! I recently resorted to buying 2 certificates, which enabled me to solve 2 mysteries and join 2 of the unconnected recent branches into the main branches.

Fred Micklethwaite married Theresa Beresford in Sheffield in 1912. The problem was that 2 brothers, Thomas and Henry, both named sons Fred, both were born about 1888, both in the Ecclesfield area. Without inside knowledge, it was impossible to work out which one had married Theresa. The marriage certificate says that Fred’s father was Henry. So I’ve linked Fred and Theresa’s family, much of it in Southern England, into the branch of John of Penistone (m2600) born about 1693.

The other large unconnected branch was headed by Harry, who married Mary Roebuck in Leeds in 1935. The name Harry was popular in the early 1900s, so it was difficult to work out which Harry this might be. One of the difficulties was explained by the certificate – Harry didn’t marry till he was 33, quite a bit older than the usual marrying age. Harry’s father is shown as Henry (deceased) – that’s a different Henry from Fred’s father. Harry’s family now form part of the branch headed by Samuel of Penistone (m2598) born about 1689, and it will push that branch’s tree on to a second page. This also means Samuel has living male descendants, although there appears to be an illegitimate event in the line ruling out YDNA testing for this branch.

I wish all the mysteries were this easy to solve!

Dade’s registers

I’ve just had my first encounter with Dade’s parish registers – they are as wonderful as people have said! I’ve managed to link a group of unconnected Micklewhites, many of whom were horsekeepers, to Richard of Tadcaster. The registers say Richard was son of Elias of Mirfield, one of my main branch heads. Excellent news, partly thanks to the latest FMP £1 for a month offer!

Here’s what FamilySearch has to say about them.

Roy Stockdill posted a list of which parishes used Dade’s Registers.